Single molecule nucleic acid sequence analysis processes and compositions

ABSTRACT

Improved solid supports and methods for analyzing target nucleotide sequences are provided herein. Certain improvements are directed to efficiently preparing nucleic acids that comprise nucleotide sequences identical to or substantially identical to one or more target nucleotide sequences, or complement thereof. The prepared nucleic acids include a reference sequence that facilitates sequence analysis. The solid supports and methods provided herein minimize the number of steps required by published sequence analysis methodologies, and thereby offer improved sequence analysis efficiency.

RELATED APPLICATION

This patent application is a continuation of U.S. patent applicationSer. No. 12/354,749, filed on Jan. 15, 2009, entitled “SINGLE MOLECULENUCLEIC ACID SEQUENCE ANALYSIS PROCESSES AND COMPOSITIONS,” namingCharles R. Cantor as applicant, and designated by attorney docket no.SEQ-6012-UT, which claims the benefit of U.S. Provisional PatentApplication No. 61/021,871, filed on Jan. 17, 2008, entitled SINGLEMOLECULE NUCLEIC ACID SEQUENCE ANALYSIS PROCESSES AND COMPOSITIONS,naming Charles R. Cantor as applicant, and designated by attorney docketno. SEQ-6012-PV. The entire content of each of the foregoing patentapplications hereby is incorporated by reference herein, including alltext, drawings and tables, in jurisdictions providing for suchincorporation.

FIELD OF THE INVENTION

The invention pertains generally to the field of nucleic acid sequenceanalysis and methodology and components for use in such analysis.

SUMMARY

Improved solid supports and methods for analyzing target nucleotidesequences are provided herein. Certain improvements are directed toefficiently preparing nucleic acids that comprise nucleotide sequencesidentical to or substantially identical to one or more target nucleotidesequences of a sample nucleic acid, or complement thereof. The preparednucleic acids include reference sequences that facilitate sequenceanalysis. The solid supports and methods provided herein minimize thenumber of steps required by published sequence analysis methodologies,and thereby offer improved sequence analysis efficiency.

The invention in part provides a method for preparing sample nucleicacid complements, which comprises: (a) preparing a mixture comprisingsample nucleic acid and a solid support under conditions in which asingle molecule of the sample nucleic acid hybridizes to a solid supportmolecule, where: the solid support comprises single-stranded solid phasenucleic acid including a primer sequence, an identifier sequence and aprobe sequence; the probe sequence hybridizes to sample nucleic acidwhen the probe sequence is complementary to a nucleotide sequence in thesample nucleic acid; the solid phase nucleic acid shares a common probesequence or the solid phase nucleic acid does not share a common probesequence; and the solid phase nucleic acid shares a common identifiersequence; and the solid support and sample nucleic acid are contacted inthe mixture under conditions that allow hybridization of the solid phasenucleic acid to the sample nucleic acid; and (b) contacting the mixturewith extension agents under conditions in which the solid phase nucleicacid hybridized to sample nucleic acid is extended; whereby samplenucleic acid complements are prepared. In certain embodiments, theextended solid phase nucleic acids are amplified by an amplificationprocess (e.g., a linear amplification process in which extension agentsinclude a primer that hybridizes to the primer sequence and is extendedto generate amplification products).

In embodiments described herein, a solid support may be in a collectionof solid supports, and the invention in part provides collections ofsolid supports and methods in which a collection of solid supports iscontacted with sample nucleic acid. In some embodiments pertaining tosolid support collections and methods of use, at least one nucleic acidof the solid phase nucleic acid of each of the solid supports in thecollection has a unique probe sequence different than a probe sequenceof the solid phase nucleic acid of the other solid supports; and thesolid phase nucleic acid of each of the solid supports in the collectionshare a unique identifier sequence different than the identifiersequence of the solid phase nucleic acid of the other solid supports.

The invention also in part provides a method for sequence analysis,which comprises: (a) preparing a mixture comprising sample nucleic acida solid support under conditions in which a single molecule of thesample nucleic acid hybridizes to a solid support molecule, where: thesolid support comprises single-stranded solid phase nucleic acidcomprising a primer sequence, an identifier sequence and a probesequence; the probe sequence hybridizes to sample nucleic acid when theprobe sequence is complementary to a nucleotide sequence in the samplenucleic acid; the solid phase nucleic acid shares a common probesequence or the solid phase nucleic acid does not share a common probesequence; and the solid phase nucleic acid shares a common identifiersequence; and the sample nucleic acid and the solid support arecontacted in the mixture under conditions that allow hybridization ofthe solid phase nucleic acid to the sample nucleic acid; (b) contactingthe mixture with extension agents under conditions in which solid phasenucleic acid hybridized to the sample nucleic acid is extended; (c)amplifying the extended solid phase nucleic acid of (b); and (d)analyzing the sequences of the amplification products of (c); wherebythe target nucleic acid sequence is analyzed.

Also, the invention in part provides a method for obtaining sequenceinformation of a target nucleic acid, which comprises: (a) preparing amixture comprising sample nucleic acid and a solid support underconditions in which a single molecule of the sample nucleic acidhybridizes to a solid support molecule, where: the solid supportcomprises single-stranded solid phase nucleic acid having a primersequence, an identifier sequence and a probe sequence; the probesequence hybridizes to sample nucleic acid when the probe sequence iscomplementary to a nucleotide sequence in the sample nucleic acid; thesolid phase nucleic acid shares a common probe sequence or the solidphase nucleic acid does not share a common probe sequence; the solidphase nucleic acid shares a common identifier sequence; and the samplenucleic acid is nucleic acid from an organism that has been subject tofragmentation and/or specific cleavage; and the sample nucleic acid andthe solid support are contacted in the mixture under conditions thatallow hybridization of the solid phase nucleic acid to the samplenucleic acid; (b) contacting the mixture with extension agents underconditions in which solid phase nucleic acid hybridized to samplenucleic acid is extended; (c) amplifying the extended solid phasenucleic acid of (b); (d) determining the nucleotide sequences of theamplification products of (c); and (e) constructing sequence informationof the target nucleic acid from the nucleotide sequences of (d).

The invention also in part provides a solid support comprisingsingle-stranded solid phase nucleic acid having an identifier sequenceand a probe sequence, where: the probe sequence hybridizes to samplenucleic acid when the probe sequence is complementary to a nucleotidesequence in the sample nucleic acid; the solid phase nucleic acid sharesa common probe sequence or the solid phase nucleic acid does not share acommon probe sequence; and the solid phase nucleic acid shares a commonidentifier sequence. Such a solid support can be in a collection ofsolid supports as described herein.

The invention also in part provides a method for manufacturing a solidsupport having single-stranded solid phase nucleic acid, whichcomprises: (a) sequentially linking nucleotides to a nucleotidecovalently linked to the solid support whereby each of the solid phasenucleic acid is prepared and in association with the solid support; or(b) linking each single-stranded nucleic acid in solution phase to thesolid support whereby the single-stranded solid phase nucleic acid is inassociation with the solid support; where: the single-stranded solidphase nucleic acid comprise an identifier sequence and a probe sequence;the probe sequence is complementary to a target nucleotide sequence; thenucleic acid shares a common probe sequence or the nucleic acid does notshare a common probe sequence; and the nucleic acid shares a commonidentifier sequence.

In certain embodiments, the solid support is in a collection of solidsupports; at least one nucleic acid of the solid phase nucleic acid ofeach of the solid supports in the collection has a unique probe sequencedifferent than a probe sequence of the solid phase nucleic acid of theother solid supports; and the solid phase nucleic acid of each of thesolid supports in the collection share a unique identifier sequencedifferent than the identifier sequence of the solid phase nucleic acidof the other solid supports.

Also, the invention in part provides a substrate comprising a collectionof beads oriented in an array, where: each bead is in association withsingle-stranded nucleic acid; the solid phase nucleic acid comprises anidentifier sequence and a probe sequence; the probe sequence hybridizesto sample nucleic acid when the probe sequence is complementary to anucleotide sequence in the sample nucleic acid; at least one nucleicacid of the solid phase nucleic acid of each of the solid supports inthe collection has a unique probe sequence different than a probesequence of the solid phase nucleic acid of the other solid supports;and the solid phase nucleic acid of each of the solid supports in thecollection share a unique identifier sequence different than theidentifier sequence of solid phase nucleic acid of the other solidsupports.

The invention also in part provides a kit comprising a solid supporthaving single-stranded solid phase nucleic acid, one or more agents thatcan extend solid phase nucleic acid hybridized to sample nucleic acid;and instructions for using the solid support and the one or morereagents; where: the solid phase nucleic acid comprises an identifiersequence and a probe sequence; the probe sequence hybridizes to samplenucleic acid when the probe sequence is complementary to a nucleotidesequence in the sample nucleic acid; the solid phase nucleic acid sharesa common probe sequence or the solid phase nucleic acid does not share acommon probe sequence; and the solid phase nucleic acid shares a commonidentifier sequence. In some embodiments, the solid support is in acollection of solid supports; at least one nucleic acid of the solidphase nucleic acid of each of the solid supports in the collection has aunique probe sequence different than a probe sequence of the solid phasenucleic acid of the other solid supports; and the solid phase nucleicacid of each of the solid supports in the collection share a uniqueidentifier sequence different than the identifier sequence of solidphase nucleic acid of the other solid supports.

Certain embodiments and features of the invention are described ingreater detail in the following description, claims and drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The drawings illustrate features of certain embodiments of theinvention.

FIGS. 1A-1C shows examples of solid support embodiments.

FIG. 2 shows a representative process for generating nucleic acidshaving a nucleotide sequence complementary to a target nucleotidesequence.

DETAILED DESCRIPTION

Improved nucleic acid sequence analysis processes and solid supportsdescribed herein find multiple uses in research and clinicalapplications. Such processes and solid supports can be utilized, forexample, to: (a) rapidly determine whether a particular target sequenceis present in a sample; (b) perform mixture analysis, e.g., identify amixture and/or its composition or determine the frequency of a targetsequence in a mixture (e.g., mixed communities, quasispecies); (c)detect sequence variations (e.g., mutations, single nucleotidepolymorphisms) in a sample; (d) perform haplotyping determinations; (e)perform microorganism (e.g., pathogen) typing; (f) detect the presenceor absence of a microorganism target sequence in a sample; (g) identifydisease markers; (h) detect microsatellites; (i) identify short tandemrepeats; (j) identify an organism or organisms; (k) detect allelicvariations; (l) determine allelic frequency; (m) determine methylationpatterns; (n) perform epigenetic determinations; (o) re-sequence aregion of a biomolecule; (p) human clinical research and medicine (e.g.cancer marker detection, sequence variation detection; detection ofsequence signatures favorable or unfavorable for a particular drugadministration), (q) HLA typing; (r) forensics; (s) vaccine qualitycontrol; (t) treatment monitoring; (u) vector identity; (v) performvaccine or production strain quality control, (w) detect test strainidentity, (x) identify a specific viral nucleic acid sequence orsequences in a viral mixture or population (e.g., hepatitis mixtures,HIV mixtures, mixed viral populations as might be found in animmuno-deficient, or immuno-compromised organism). Certain aspects ofthe invention are described hereafter.

Sample Nucleic Acid

Sample nucleic acid may be derived from one or more samples or sources.As used herein, “nucleic acid” refers to polynucleotides such asdeoxyribonucleic acid (DNA) and ribonucleic acid (RNA). The term shouldalso be understood to include, as equivalents, derivatives, variants andanalogs of RNA or DNA made from nucleotide analogs, single (sense orantisense) and double-stranded polynucleotides. It is understood thatthe term “nucleic acid” does not refer to or infer a specific length ofthe polynucleotide chain, thus nucleotides, polynucleotides, andoligonucleotides are also included in the definition.Deoxyribonucleotides include deoxyadenosine, deoxycytidine,deoxyguanosine and deoxythymidine. For RNA, the uracil base is uridine.A source or sample containing sample nucleic acid(s) may contain one ora plurality of sample nucleic acids. A plurality of sample nucleic acidsas described herein refers to at least 2 sample nucleic acids andincludes nucleic acid sequences that may be identical or different. Thatis, the sample nucleic acids may all be representative of the samenucleic acid sequence, or may be representative of two or more differentnucleic acid sequences (e.g., from 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 50, 100, 1000 or more sequences).

A sample may be collected from an organism, mineral or geological site(e.g., soil, rock, mineral deposit, combat theater), forensic site(e.g., crime scene, contraband or suspected contraband), or apaleontological or archeological site (e.g., fossil, or bone) forexample. A sample may be a “biological sample,” which refers to anymaterial obtained from a living source or formerly-living source, forexample, an animal such as a human or other mammal, a plant, abacterium, a fungus, a protist or a virus. The biological sample can bein any form, including without limitation a solid material such as atissue, cells, a cell pellet, a cell extract, or a biopsy, or abiological fluid such as urine, blood, saliva, amniotic fluid, exudatefrom a region of infection or inflammation, or a mouth wash containingbuccal cells, urine, cerebral spinal fluid and synovial fluid andorgans. A sample also may be isolated at a different time point ascompared to another sample, where each of the samples are from the sameor a different source. A sample nucleic acid may be from a nucleic acidlibrary, such as a cDNA or RNA library, for example. A sample nucleicacid may be a result of nucleic acid purification or isolation and/oramplification of nucleic acid molecules from the sample. Sample nucleicacid provided for sequence analysis processes described herein maycontain nucleic acid from one sample or from two or more samples (e.g.,from 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19or 20 samples). A sample nucleic acid sample can contain host andnon-host sample nucleic acid, and in some embodiments, a sample maycontain two or more different species of sample nucleic acid (e.g.,mutant vs. wild-type, transplants, forensics, mother vs. fetus).

Sample nucleic acid may comprise or consist essentially of any type ofnucleic acid suitable for use with processes of the invention, such assample nucleic acid that can hybridize to solid phase nucleic acid(described hereafter), for example. A sample nucleic in certainembodiments can comprise or consist essentially of DNA (e.g.,complementary DNA (cDNA), genomic DNA (gDNA) and the like), RNA (e.g.,message RNA (mRNA), short inhibitory RNA (siRNA), ribosomal RNA (rRNA),tRNA and the like), and/or DNA or RNA analogs (e.g., containing baseanalogs, sugar analogs and/or a non-native backbone and the like). Anucleic acid can be in any form useful for conducting processes herein(e.g., linear, circular, supercoiled, single-stranded, double-strandedand the like). A nucleic acid may be, or may be from, a plasmid, phage,autonomously replicating sequence (ARS), centromere, artificialchromosome, chromosome, a cell, a cell nucleus or cytoplasm of a cell incertain embodiments. A sample nucleic acid in some embodiments is from asingle chromosome (e.g., a nucleic acid sample may be from onechromosome of a sample obtained from a diploid organism).

Sample nucleic acid may be provided for conducting methods describedherein without processing of the sample(s) containing the nucleic acidin certain embodiments. In some embodiments, sample nucleic acid isprovided for conducting methods described herein after processing of thesample(s) containing the nucleic acid. For example, a sample nucleicacid may be extracted, isolated, purified or amplified from thesample(s). The term “isolated” as used herein refers to nucleic acidremoved from its original environment (e.g., the natural environment ifit is naturally occurring, or a host cell if expressed exogenously), andthus is altered “by the hand of man” from its original environment. Anisolated nucleic acid generally is provided with fewer non-nucleic acidcomponents (e.g., protein, lipid) than the amount of components presentin a source sample. A composition comprising isolated sample nucleicacid can be substantially isolated (e.g., about 90%, 91%, 92%, 93%, 94%,95%, 96%, 97%, 98%, 99% or greater than 99% free of non-nucleic acidcomponents). The term “purified” as used herein refers to sample nucleicacid provided that contains fewer nucleic acid species than in thesample source from which the sample nucleic acid is derived. Acomposition comprising sample nucleic acid may be substantially purified(e.g., about 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greaterthan 99% free of other nucleic acid species). The term “amplified” asused herein refers to subjecting nucleic acid of a sample to a processthat linearly or exponentially generates amplicon nucleic acids havingthe same or substantially the same nucleotide sequence as the nucleotidesequence of the nucleic acid in the sample, or portion thereof.

Sample nucleic acid also may be processed by subjecting nucleic acid toa method that generates nucleic acid fragments, in certain embodiments,before providing sample nucleic acid for a process described herein. Insome embodiments, sample nucleic acid subjected to fragmentation orcleavage may have a nominal, average or mean length of about 5 to about10,000 base pairs, about 100 to about 1,000 base pairs, about 100 toabout 500 base pairs, or about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55,60, 65, 70, 75, 80, 85, 90, 95, 100, 200, 300, 400, 500, 600, 700, 800,900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000 or 10000 basepairs. Fragments can be generated by any suitable method known in theart, and the average, mean or nominal length of nucleic acid fragmentscan be controlled by selecting an appropriate fragment-generatingprocedure. In certain embodiments, sample nucleic acid of a relativelyshorter length can be utilized to analyze sequences that contain littlesequence variation and/or contain relatively large amounts of knownnucleotide sequence information. In some embodiments, sample nucleicacid of a relatively longer length can be utilized to analyze sequencesthat contain greater sequence variation and/or contain relatively smallamounts of unknown nucleotide sequence information.

Sample nucleic acid fragments often contain overlapping nucleotidesequences, and such overlapping sequences can facilitate construction ofa nucleotide sequence of the previously non-fragmented sample nucleicacid, or a portion thereof. For example, one fragment may havesubsequences x and y and another fragment may have subsequences y and z,where x, y and z are nucleotide sequences that can be 5 nucleotides inlength or greater. Overlap sequence y can be utilized to facilitateconstruction of the x-y-z nucleotide sequence in nucleic acid from asample. Sample nucleic acid may be partially fragmented (e.g., from anincomplete or terminated specific cleavage reaction) or fully fragmentedin certain embodiments.

Sample nucleic acid can be fragmented by various methods, which include,without limitation, physical, chemical and enzymic processes. Examplesof such processes are described in U.S. Patent Application PublicationNo. 20050112590 (published on May 26, 2005, entitled“Fragmentation-based methods and systems for sequence variationdetection and discovery,” naming Van Den Boom et al.). Certain processescan be selected to generate non-specifically cleaved fragments orspecifically cleaved fragments. Examples of processes that can generatenon-specifically cleaved fragment sample nucleic acid include, withoutlimitation, contacting sample nucleic acid with apparatus that exposenucleic acid to shearing force (e.g., passing nucleic acid through asyringe needle; use of a French press); exposing sample nucleic acid toirradiation (e.g., gamma, x-ray, UV irradiation; fragment sizes can becontrolled by irradiation intensity); boiling nucleic acid in water(e.g., yields about 500 base pair fragments) and exposing nucleic acidto an acid and base hydrolysis process.

Sample nucleic acid may be specifically cleaved by contacting thenucleic acid with one or more specific cleavage agents. The term“specific cleavage agent” as used herein refers to an agent, sometimes achemical or an enzyme, that can cleave a nucleic acid at one or morespecific sites. Specific cleavage agents often will cleave specificallyaccording to a particular nucleotide sequence at a particular site.

Examples of enzymic specific cleavage agents include without limitationendonucleases (e.g., DNase (e.g., DNase I, II); RNase (e.g., RNase E, F,H, P); Cleavase™ enzyme; Taq DNA polymerase; E. coli DNA polymerase Iand eukaryotic structure-specific endonucleases; murine FEN-1endonucleases; type I, II or III restriction endonucleases such as AccI, Afl III, Alu I, Alw44 I, Apa I, Asn I, Ava I, Ava II, BamH I, Ban II,Bcl I, Bgl I. Bgl II, Bln I, Bsm I, BssH II, BstE II, Cfo I, Cla I, DdeI, Dpn I, Dra I, EcIX I, EcoR I, EcoR I, EcoR II, EcoR V, Hae II, HaeII, Hind II, Hind III, Hpa I, Hpa II, Kpn I, Ksp I, Mlu I, MIuN I, MspI, Nci I, Nco I, Nde I, Nde II, Nhe I, Not I, Nru I, Nsi I, Pst I, PvuI, Pvu II, Rsa I, Sac I, Sal I, Sau3A I, Sca I, ScrF I, Sfi I, Sma I,Spe I, Sph I, Ssp I, Stu I, Sty I, Swa I, Taq I, Xba I, Xho I.);glycosylases (e.g., uracil-DNA glycolsylase (UDG), 3-methyladenine DNAglycosylase, 3-methyladenine DNA glycosylase II, pyrimidine hydrate-DNAglycosylase, FaPy-DNA glycosylase, thymine mismatch-DNA glycosylase,hypoxanthine-DNA glycosylase, 5-Hydroxymethyluracil DNA glycosylase(HmUDG), 5-Hydroxymethylcytosine DNA glycosylase, or 1,N6-etheno-adenineDNA glycosylase); exonucleases (e.g., exonuclease III); ribozymes, andDNAzymes. Sample nucleic acid may be treated with a chemical agent, orsynthesized using modified nucleotides, and the modified nucleic acidmay be cleaved. In non-limiting examples, sample nucleic acid may betreated with (i) alkylating agents such as methylnitrosourea thatgenerate several alkylated bases, including N3-methyladenine andN3-methylguanine, which are recognized and cleaved by alkyl purineDNA-glycosylase; (ii) sodium bisulfite, which causes deamination ofcytosine residues in DNA to form uracil residues that can be cleaved byuracil N-glycosylase; and (iii) a chemical agent that converts guanineto its oxidized form, 8-hydroxyguanine, which can be cleaved byformamidopyrimidine DNA N-glycosylase. Examples of chemical cleavageprocesses include without limitation alkylation, (e.g., alkylation ofphosphorothioate-modified nucleic acid); cleavage of acid lability ofP3′-N5′-phosphoroamidate-containing nucleic acid; and osmium tetroxideand piperidine treatment of nucleic acid.

As used herein, the term “complementary cleavage reactions” refers tocleavage reactions that are carried out on the same sample nucleic acidusing different cleavage reagents or by altering the cleavagespecificity of the same cleavage reagent such that alternate cleavagepatterns of the same target or reference nucleic acid or protein aregenerated. In certain embodiments, sample nucleic acid may be treatedwith one or more specific cleavage agents (e.g., 1, 2, 3, 4, 5, 6, 7, 8,9, 10 or more specific cleavage agents) in one or more reaction vessels(e.g., sample nucleic acid is treated with each specific cleavage agentin a separate vessel).

Sample nucleic acid also may be exposed to a process that modifiescertain nucleotides in the nucleic acid before providing sample nucleicacid for a method described herein. A process that selectively modifiesnucleic acid based upon the methylation state of nucleotides therein canbe applied to sample nucleic acid. The term “methylation state” as usedherein refers to whether a particular nucleotide in a polynucleotidesequence is methylated or not methylated. Methods for modifying a targetnucleic acid molecule in a manner that reflects the methylation patternof the target nucleic acid molecule are known in the art, as exemplifiedin U.S. Pat. No. 5,786,146 and U.S. patent publications 20030180779 and20030082600. For example, non-methylated cytosine nucleotides in anucleic acid can be converted to uracil by bisulfite treatment, whichdoes not modify methylated cytosine. Non-limiting examples of agentsthat can modify a nucleotide sequence of a nucleic acid includemethylmethane sulfonate, ethylmethane sulfonate, diethylsulfate,nitrosoguanidine (N-methyl-N′-nitro-N-nitrosoguanidine), nitrous acid,di-(2-chloroethyl)sulfide, di-(2-chloroethyl)methylamine, 2-aminopurine,t-bromouracil, hydroxylamine, sodium bisulfite, hydrazine, formic acid,sodium nitrite, and 5-methylcytosine DNA glycosylase. In addition,conditions such as high temperature, ultraviolet radiation, x-radiation,can induce changes in the sequence of a nucleic acid molecule.

Sample nucleic acid may be provided in any form useful for conducting asequence analysis or manufacture process described herein, such as solidor liquid form, for example. In certain embodiments, sample nucleic acidmay be provided in a liquid form optionally comprising one or more othercomponents, including without limitation one or more buffers or salts.

Solid Supports and Solid Phase Nucleic Acid

The term “solid support” or “solid phase” as used herein refers to awide variety of materials including solids, semi-solids, gels, films,membranes, meshes, felts, composites, particles, and the like typicallyused to sequester molecules, and more specifically refers to aninsoluble material with which nucleic acid can be associated. A solidsupport for use with processes described herein sometimes is selected inpart according to size: solid supports having a size smaller than thesize a microreactor (defined hereafter) sometimes are selected. Examplesof solid supports for use with processes described herein include,without limitation, beads (e.g., microbeads, nanobeads) and particles(e.g., microparticles, nanoparticles).

The terms “beads” and “particles” as used herein refer to solid supportssuitable for associating with biomolecules, and more specificallynucleic acids. Beads may have a regular (e.g., spheroid, ovoid) orirregular shape (e.g., rough, jagged), and sometimes are non-spherical(e.g., angular, multi-sided). Particles or beads having a nominal,average or mean diameter less than the nominal, average, mean or minimumdiameter of a microreactor can be utilized. Particles or beads having anominal, average or mean diameter of about 1 nanometer to about 500micrometers can be utilized, such as those having a nominal, mean oraverage diameter, for example, of about 10 nanometers to about 100micrometers; about 100 nanometers to about 100 micrometers; about 1micrometer to about 100 micrometers; about 10 micrometers to about 50micrometers; about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65,70, 75, 80, 85, 90, 95, 100, 200, 300, 400, 500, 600, 700, 800 or 900nanometers; or about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60,65, 70, 75, 80, 85, 90, 95, 100, 200, 300, 400, 500 micrometers.

A bead or particle can be made of virtually any insoluble or solidmaterial. For example, the bead or particle can comprise or consistessentially of silica gel, glass (e.g. controlled-pore glass (CPG)),nylon, Sephadex®, Sepharose®, cellulose, a metal surface (e.g. steel,gold, silver, aluminum, silicon and copper), a magnetic material, aplastic material (e.g., polyethylene, polypropylene, polyamide,polyester, polyvinylidenedifluoride (PVDF)) and the like. Beads orparticles may be swellable (e.g., polymeric beads such as Wang resin) ornon-swellable (e.g., CPG). Commercially available examples of beadsinclude without limitation Wang resin, Merrifield resin and Dynabeads®.Beads may also be made as solid particles or particles that containinternal voids.

Solid supports suitable for use with sequence analysis processesdescribed herein often are in association with nucleic acid referred toherein as “solid phase nucleic acid.” The term “solid phase nucleicacid” as used herein generally refers to one or more different nucleicacid species in association with a solid support. A solid phase “nucleicacid species” as used herein refers to a first nucleic acid having anucleotide sequence that differs by one nucleotide base or more from thenucleotide sequence of a second nucleic acid when the nucleotidesequences of the first and second nucleic acids are aligned. Thus onenucleic acid species may differ from a second nucleic acid species byone or more nucleotides when the nucleotide sequences of the first andsecond nucleic acids are aligned with one another (e.g., about 1, 2, 3,4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85,90, 95, 100 or more than 100 nucleotide differences).

A solid support may be provided in a collection of solid supports. Asolid support collection may comprise two or more different solidsupport species. The term “solid support species” as used herein refersto a solid support in association with one particular solid phasenucleic acid species or a particular combination of different solidphase nucleic acid species. In certain embodiments, a solid supportcollection comprises 2 to 10,000 solid support species, 10 to 1,000solid support species or about 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25,30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 200, 300,400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000,8000, 9000 or 10000 unique solid support species. The solid supports(e.g., beads) in the collection of solid supports may be homogeneous(e.g., all are Wang resin beads) or heterogeneous (e.g., some are Wangresin beads and some are magnetic beads).

Solid phase nucleic acid generally is single-stranded and is of any typesuitable for hybridizing sample nucleic acid (e.g., DNA, RNA, analogsthereof (e.g., peptide nucleic acid (PNA)), chimeras thereof (e.g., asingle strand comprises RNA bases and DNA bases) and the like). Solidphase nucleic acid is associated with the solid support in any mannersuitable for hybridization of solid phase nucleic acid to sample nucleicacid. Solid phase nucleic acid may be in association with a solidsupport by a covalent linkage or a non-covalent interaction.Non-limiting examples of non-covalent interactions include hydrophobicinteractions (e.g., C18 coated solid support and tritylated nucleicacid), polar interactions (e.g., “wetting” association between nucleicacid/polyethylene glycol), pair interactions including withoutlimitation, antibody/antigen, antibody/antibody, antibody/antibodyfragment, antibody/antibody receptor, antibody/protein A or protein G,hapten/anti-hapten, biotin/avidin, biotin/streptavidin, folicacid/folate binding protein, vitamin B12/intrinsic factor, nucleicacid/complementary nucleic acid (e.g., DNA, RNA, PNA) and the like.

Solid phase nucleic acid may be associated with a solid support bydifferent methodology, which include, without limitation (i)sequentially synthesizing nucleic acid directly on a solid support, and(ii) synthesizing nucleic acid, providing the nucleic acid in solutionphase and linking the nucleic acid to a solid support. Solid phasenucleic acid may be linked covalently at various sites in the nucleicacid to the solid support, such as (i) at a 1′, 2′, 3′, 4′ or 5′position of a sugar moiety or (ii) a pyrimidine or purine base moiety,of a terminal or non-terminal nucleotide of the nucleic acid, forexample. The 5′ terminal nucleotide of the solid phase nucleic acid islinked to the solid support in certain embodiments.

Methods for sequentially synthesizing nucleic acid directly on a solidsupport are known. For example, the 3′ end of nucleic acid can be linkedto the solid support (e.g., phosphoramidite method described inCaruthers, Science 230: 281-286 (1985)) or the 5′ end of the nucleicacid can be linked to the solid support (e.g., Claeboe et al, NucleicAcids Res. 31(19): 5685-5691 (2003)).

Methods for linking solution phase nucleic acid to a solid support alsoare known (e.g., U.S. Pat. No. 6,133,436, naming Koster et al. andentitled “Beads bound to a solid support and to nucleic acids” and WO91/08307, naming Van Ness and entitled “Enhanced capture of targetnucleic acid by the use of oligonucleotides covalently attached topolymers”). Examples include, without limitation, thioether linkages(e.g., thiolated nucleic acid); disulfide linkages (e.g., thiol beads,thiolated nucleic acid); amide linkages (e.g., Wang resin, amino-linkednucleic acid); acid labile linkages (e.g., glass beads, tritylatednucleic acid) and the like. Nucleic acid may be linked to a solidsupport without a linker or with a linker (e.g., S. S. Wong, “Chemistryof Protein Conjugation and Cross-Linking,” CRC Press (1991), and G. T.Hermanson, “Bioconjugate Techniques,” Academic Press (1995). A homo orhetero-biofunctional linker reagent, can be selected, and examples oflinkers include without limitation N-succinimidyl(4-iodoacetyl)aminobenzoate (STAB), dimaleimide, dithio-bis-nitrobenzoic acid (DTNB),N-succinimidyl-S-acetyl-thioacetate (SATA),N-succinimidyl-3-(2-pyridyldithio) propionate (SPDP), succinimidyl4-(N-maleimidomethyl)cyclohexane-1-carboxylate (SMCC),6-hydrazinonicotimide (HYNIC), 3-amino-(2-nitrophenyl)propionic acid andthe like.

Nucleic acid can be synthesized using standard methods and equipment,such as the ABI®3900 High Throughput DNA Synthesizer and EXPEDITE®8909Nucleic Acid Synthesizer, both of which are available from AppliedBiosystems (Foster City, Calif.). Analogs and derivatives are describedin U.S. Pat. Nos. 4,469,863; 5,536,821; 5,541,306; 5,637,683; 5,637,684;5,700,922; 5,717,083; 5,719,262; 5,739,308; 5,773,601; 5,886,165;5,929,226; 5,977,296; 6,140,482; WO 00/56746; WO 01/14398, and relatedpublications. Methods for synthesizing nucleic acids comprising suchanalogs or derivatives are disclosed, for example, in the patentpublications cited above and in U.S. Pat. Nos. 5,614,622; 5,739,314;5,955,599; 5,962,674; 6,117,992; in WO 00/75372 and in relatedpublications. In certain embodiments, analog nucleic acids includeinosines, abasic sites, locked nucleic acids, minor groove binders,duplex stabilizers (e.g., acridine, spermidine) and/or other meltingtemperature modifiers (e.g., target nucleic acid, solid phase nucleicacid, and/or primer nucleic acid may comprise an analog).

The density of solid phase nucleic acid molecules per solid support unit(e.g., one bead) can be selected. A maximum density can be selected thatallows for hybridization of sample nucleic acid to solid phase nucleicacid. In certain embodiments, solid phase nucleic acid density per solidsupport unit (e.g., nucleic acid molecules per bead) is about 5 nucleicacids to about 10,000 nucleic acids per solid support. The density ofthe solid phase nucleic acid per unit solid support in some embodimentsis about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80,85, 90, 95, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000,3000, 4000, 5000, 6000, 7000, 8000, 9000 or 10000 nucleic acids persolid support. In certain embodiments the density of the solid phasenucleic acid per unit solid support is about 1 to 1 (e.g., one moleculeof solid phase nucleic acid to one bead).

In some embodiments, solid phase nucleic acid comprises certainsubsequences. One subsequence may be complementary to or substantiallycomplementary to a sample nucleic acid nucleotide subsequence and allowssolid phase nucleic acid to hybridize to sample nucleic acid. Such asubsequence (e.g., illustrated in FIGS. 1A-1C) is referred to herein asa “probe” sequence, and a solid support can contain one or more probesequence species. A “probe sequence species” as used herein refers to afirst probe nucleotide sequence that differs by one nucleotide base ormore from a second probe nucleotide sequence when the first and secondprobe nucleotide sequences are aligned. Thus one probe sequence speciesmay differ from a second probe sequence species by one or morenucleotides when the first and second probe sequences are aligned withone another (e.g., about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45,50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 or more than 100 nucleotidesthat are not identical upon alignment). Alignment techniques andsequence identity assessment methodology are known (e.g., algorithm ofMeyers & Miller, CABIOS 4: 11-17 (1989), which has been incorporatedinto the ALIGN program (version 2.0)).

A probe nucleotide sequence is of a length sufficient to specificallyhybridize to a sample nucleic acid nucleotide sequence. In certainembodiments a probe sequence is about 5 to about 100 nucleotides inlength, and sometimes is about 5 to about 40 nucleotides in length.Generally, a shorter probe sequence is selected for applications wherethe target nucleotide sequence is known or partially known and longerprobe sequence are selected for applications in which the targetnucleotide sequence or portions thereof are not known. In someembodiments, a probe sequence is about 5, 10, 15, 20, 25, 30, 35, 40,45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 200, 300, 400, 500,600, 700, 800, 900, 1000, 2000, 3000, 4000 or 5000 nucleotides inlength.

In some embodiments solid phase nucleic acid of a solid support species,or a collection of solid support species, may include any number ofprobe sequence species useful for carrying out sequence analysisprocesses provided herein. In certain embodiments, one solid supportcomprises about 10 to about 10,000 unique probe sequences (e.g., about10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700,800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, or about10,000 different probe sequence species); one solid support comprisesabout 10 to about 1,000 unique probe sequences (e.g., about 10, 20, 30,40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, orabout 1,000 different probe sequence species); a collection of solidsupports comprises about 10 to about 10,000 unique probe sequences(e.g., about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400,500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000,9000, or about 10,000 different probe sequence species); and acollection of solid supports comprises about 10 to about 1,000 uniqueprobe sequences (e.g., about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100,200, 300, 400, 500, 600, 700, 800, 900, or about 1,000 different probesequence species). In some embodiments, fewer probe sequence species persolid support or per solid support collection are utilized (e.g., forhaplotyping applications) and sometimes greater numbers of probesequence species per solid support or solid support collection areutilized (e.g., for sequencing applications). In certain embodiments,one solid support, or a collection of solid supports, includes about 5,10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95,100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000,5000, 6000, 7000, 8000, 9000 or 10000 unique probe sequence species. Incertain embodiments, solid phase nucleic acid of one solid supportspecies shares only one probe sequence species, and in relatedcollection embodiments, solid phase nucleic acid of each solid supportspecies in the collection shares only one probe sequence species (i.e.,one probe sequence species per solid support species).

In certain embodiments, solid phase nucleic acid also may contain anidentification sequence (e.g., illustrated in FIGS. 1A-1C), which may beuseful in part for constructing partial sequence reads into largersequence constructions in certain embodiments. An identificationsequence can be “unique” for each solid support species, where the term“unique” as used here refers to there being one identification sequencespecies for each solid support species. An “identification sequencespecies” as used herein refers to a first identification nucleotidesequence that differs by one nucleotide base or more from a secondidentification nucleotide sequence when the first and secondidentification nucleotide sequences are aligned. Thus one identificationsequence species may differ from a second identification sequencespecies by one or more nucleotides when the first and secondidentification sequences are aligned with one another (e.g., about 1, 2,3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85,90, 95, 100 or more than 100 nucleotides that are not identical uponalignment). An identification sequence may be detected, in someembodiments, by a property selected from the group consisting of size,shape, electrical properties, magnetic properties, optical properties,chemical properties, and the like.

An identification sequence may be of any length suitable for analyzingthe nucleotide sequence or partial nucleotide sequence of sample nucleicacid. In some embodiments, an identifier sequence is about 5 to about 50contiguous nucleotides in length, sometimes about 5 to about 20nucleotides in length and at times about 10 nucleotides in length. Incertain embodiments, an identifier sequence is about 5, 6, 7, 8, 9, 10,11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45 or 50 nucleotides in length.

In some embodiments solid phase nucleic acid often includes a primersequence (Pr), which is also referred to herein as a “primerhybridization sequence.” Primer sequence (Pr) can hybridize to acomplementary nucleotide sequence in a primer nucleic acid that can beutilized to amplify extended solid phase nucleic acid previouslyhybridized to a sample nucleic acid. As used herein, the term “primernucleic acid” refers to a nucleic acid (e.g., naturally occurring orsynthetic) that has a nucleic acid sequence complimentary to a primerhybridization sequence, and can hybridize to the primer hybridizationsequence under hybridization conditions and can be extended in anamplification process (e.g., primer extension, PCR amplification, andthe like). Primer nucleic acids may be of any length suitable foroptimized hybridization and may be in the range of about 5 nucleotidesto about 5000 nucleotides in length (e.g., about 5 nucleotides, 10, 15,20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100,110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 300, 400, 500, 600,700, 800, 900, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, or about5000 nucleotides in length). In some embodiments primer nucleic acidsmay be modified, which may be effected by a modification processincluding, without limitation, modification of codons, synthesis usingnucleotide analogs, post synthetic modification and the like.

In a collection of solid supports, each solid phase nucleic acid on eachsolid support species may have a common primer sequence (e.g., all solidsupport species have the same primer sequence species), in which casethe primer sequence is referred to as a “universal” or “common” primersequence. In certain embodiments, solid phase nucleic acid of a firstsolid support species in a collection may have a first primer sequencespecies and solid phase nucleic acid of a second solid support speciesin the collection may have a second primer sequence species. A “primersequence species” as used herein refers to a first primer nucleotidesequence that differs by one nucleotide base or more from a secondprimer nucleotide sequence when the first and second primer nucleotidesequences are aligned. Thus one primer sequence species may differ froma second primer sequence species by one or more nucleotides when thefirst and second primer sequences are aligned with one another (e.g.,about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70,75, 80, 85, 90, 95, 100 or more than 100 nucleotides that are notidentical upon alignment).

Primer hybridization sequence (Pr) can be of a length that allowsspecific hybridization of a primer under the conditions for primerhybridization, in some embodiments. The length of primer hybridizationsequence (Pr) is about 10 to about 100 nucleotides, about 10 to about 50nucleotides or about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65,70, 75, 80, 85, 90, 95, or 100 nucleotides in length in certainembodiments. In certain embodiments, sample nucleic acid of a solidsupport species, or collection of solid supports, includes one or moreprimer sequence species (e.g., about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30,35, 40, 45 or 50 unique primer hybridization sequence species). Thus, incertain embodiments, solid phase nucleic acid of a solid support speciesincludes one primer hybridization sequence species, and nucleic acid ofa collection of solid supports shares a common primer hybridizationsequence species.

Non-limiting examples of different solid support species, each havingprobe sequence species, identification sequence species and a primerhybridization sequence species, are shown in FIGS. 1A, 1B and 1C. FIG.1A shows a solid support species having a particular combination ofsolid phase nucleic acid differing by probe sequence species P₁, P₂, P₃,. . . P_(n). In FIG. 1A, the probe sequence species are complementary tosubsequences in sample nucleic acid (e.g., probe sequence species P₁,P₂, P₃, . . . P_(n) are complementary to sample nucleic acidsubsequences 1, 2, 3, . . . N, respectively). FIG. 1B shows a collectionof three solid support species, where each solid support speciesincludes solid phase nucleic acid having a unique identificationsequence and different probe sequence species. In FIG. 1 B, solid phasenucleic acid of solid support species X has probe sequence speciesP_(X1), P_(X2), P_(X3), . . . P_(Xn); solid phase nucleic acid of solidsupport species X has probe sequence species P_(Y1), P_(Y2), P_(Y3), . .. P_(Yn); and solid phase nucleic acid of solid support species X hasprobe sequence species P_(Z1), P_(Z2), P_(Z3), . . . P_(Zn). FIG. 1Cshows a collection of three solid support species, where each solidsupport species includes solid phase nucleic acid having a uniqueidentification sequence and the same probe sequence species.

Probe, identification and primer hybridization sequences in solid phasenucleic acid can be arranged in any suitable orientation with respect toone another for performing the methods described herein. Any two ofthese sequences may be contiguous or may be separated by an interveningsequence of a suitable length (e.g., an intervening sequence of about 1,2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80,85, 90, 95, 100 or more than 100 nucleotides) In certain embodiments,the sequences are contiguous and in the following orientation:5′-(primer sequence)-(identification sequence)-(probe sequence)-3′.

Solid supports having solid phase nucleic acids may be provided in anyconvenient form for contacting a sample nucleic acid, such as solid orliquid form, for example. In certain embodiments, a solid support may beprovided in a liquid form optionally comprising one or more othercomponents, which include without limitation one or more buffers orsalts. Solid supports of a collection may be provided in one container,or may be distributed across multiple containers.

Solid supports may be provided in an array in certain embodiments, orinstructions may be provided to arrange solid supports in an array on asubstrate. The term “array” as used herein may refer to an arrangementof sample locations on a single two-dimensional solid support, or anarrangement of solid supports across a two-dimensional surface. An arraymay be of any convenient general shape (e.g., circular, oval, square,rectangular). An array may be referred to as an “X by Y array” forsquare or rectangular arrays, where the array includes X number ofsample locations or solid supports in one dimension and Y number ofsample locations or solid supports in a perpendicular dimension. Anarray may be symmetrical (e.g., a 16 by 16 array) or non-symmetrical(e.g., an 8 by 16 array). An array may include any convenient number ofsample locations or solid supports in any suitable arrangement. Forexample, X or Y independently can be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29or 30 in some embodiments.

An array may contain one solid support species or multiple solid supportspecies from a collection. The array can be arranged on any substratesuitable for sequence analysis or manufacture processes describedherein. Examples of substrates include without limitation flatsubstrates, filter substrates, wafer substrates, etched substrates,substrates having multiple wells or pits (e.g., microliter (about 1microliter, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75,80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200,300, 400, 500, 600, 700, 800, 900 and up to about 999 microlitervolume), nanoliter (1 nanoliter, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50,55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160,170, 180, 190, 200, 300, 400, 500, 600, 700, 800, 900 and up to about999 nanoliter volume), picoliter (1 picoliter, 5, 10, 15, 20, 25, 30,35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130,140, 150, 160, 170, 180, 190, 200, 300, 400, 500, 600, 700, 800, 900 andup to about 999 picoliter volume) wells or pits; wells having filterbottoms), substrates having one or more channels, substrates having oneor more electrodes, and the like, and combinations thereof. Wells orpits of multiple well and pit substrates may contain one or more solidsupport units (e.g., each unit being a single bead or particle).Substrates can comprise or consist essentially of a suitable materialfor conducting sequence analysis or manufacture processes describedherein, including without limitation, fiber (e.g., fiber filters), glass(e.g., glass surfaces, fiber optic surfaces), metal (e.g., steel, gold,silver, aluminum, silicon and copper; metal coating), plastic (e.g.,polyethylene, polypropylene, polyamide, polyvinylidenedifluoride),silicon and the like. In certain embodiments, the array can be amicroarray or a nanoarray. A “nanoarray,” often is an array in whichsolid support units are separated by about 0.1 nanometers to about 10micrometers, for example from about 1 nanometer to about 1 micrometer(e.g. about 0.1 nanometers, 0.5, 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60,70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900 nanometers, 1micrometer, 2, 3, 4, 5, 6, 7, 8, 9, and up to about 10 micrometers). A“microarray” is an array in which solid support units are separated bymore than 1 micrometer. The density of solid support units on arraysoften is at least 100/cm², and can be 100/cm² to about 10,000/cm²,100/cm² to about 1,000/cm² or about 150, 200, 300, 400, 500, 600, 700,800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000 or 10000solid support units/cm².

Single Molecules of Sample Nucleic Acid

In certain methods described herein, sample nucleic acid and solidsupport are contacted under conditions in which a single molecule ofsample nucleic acid hybridizes to a single molecule of a solid support.That is, in some embodiments hybridization conditions can be optimizedto allow a single molecule of sample nucleic acid per solid support(e.g. bead or particle), or to allow more than one sample nucleic acidspecies to hybridize per solid support (e.g., beads or particles areconfigured to have more than one species of primer sequence,identification sequence, probe sequence, or combinations thereof). Insome embodiments a single molecule of nucleic acid sample can behybridized per solid support under dilute DNA concentration conditionswhere hybridization of only one molecule of sample nucleic acid per beadis favored. In some embodiments, hybridization conditions can beconfigured to include only one molecule of sample nucleic acid in thehybridization step. Such conditions can include providing the solidsupport molecules and a single molecule of sample nucleic acid in a“microreactor” in certain embodiments Such conditions also includeproviding mixture in which the sample nucleic acid molecule canhybridize to solid phase nucleic acid on the solid support.

As used herein, the term “microreactor” refers to a partitioned space inwhich a single molecule of sample nucleic acid can hybridize to a solidsupport molecule. In some embodiments, the microreactor volume is largeenough to accommodate one solid support bead in the microreactor andsmall enough to exclude the presence of two or more beads in themicroreactor. Examples of microreactors include without limitation anemulsion globule (described hereafter) and a void in a substrate. A voidin a substrate can be a pit, a pore or a well (e.g., microwell,nanowell, picowell, micropore, or nanopore) in a substrate constructedfrom a solid material useful for containing fluids (e.g., plastic (e.g.,polypropylene, polyethylene, polystyrene) or silicon) in certainembodiments. Emulsion globules are partitioned by an immiscible phase asdescribed in greater detail hereafter. A single molecule of samplenucleic acid can be provided in a microreactor by contacting samplenucleic acid molecules with an excess (e.g., molar excess) of solidsupport molecules. In certain embodiments, the excess amount (e.g.,molar amount) of solid support is about 10, 15, 20, 25, 30, 35, 40, 45,50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 150, 200, 300, 400, 500,600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000,9000, 10000 times, or more, the amount of sample nucleic acid.

The term “emulsion” as used herein refers to a mixture of two immiscibleand unblendable substances, in which one substance (the dispersed phase)often is dispersed in the other substance (the continuous phase). Thedispersed phase can be an aqueous solution (i.e., a solution comprisingwater) in certain embodiments. In some embodiments, the dispersed phaseis composed predominantly of water (e.g., greater than 70%, greater than75%, greater than 80%, greater than 85%, greater than 90%, greater than95%, greater than 97%, greater than 98% and greater than 99% water (byweight)). Each discrete portion of a dispersed phase, such as an aqueousdispersed phase, is referred to herein as a “globule” or “microreactor.”A globule sometimes may be spheroidal, substantially spheroidal orsemi-spheroidal in shape, in certain embodiments.

The terms “emulsion apparatus” and “emulsion component(s)” as usedherein refer to apparatus and components that can be used to prepare anemulsion. Non-limiting examples of emulsion apparatus include withoutlimitation counter-flow, cross-current, rotating drum and membraneapparatus suitable for use to prepare an emulsion. An emulsion componentforms the continuous phase of an emulsion in certain embodiments, andincludes without limitation a substance immiscible with water, such as acomponent comprising or consisting essentially of an oil (e.g., aheat-stable, biocompatible oil (e.g., light mineral oil)). Abiocompatible emulsion stabilizer can be utilized as an emulsioncomponent. Emulsion stabilizers include without limitation Atlox 4912,Span 80 and other biocompatible surfactants.

In some embodiments, components useful for biological reactions can beincluded in the dispersed phase. Globules of the emulsion can include(i) a solid support unit (e.g., one bead or one particle); (ii) samplenucleic acid molecule; and (iii) a sufficient amount of extension agentsto elongate solid phase nucleic acid and amplify the elongated solidphase nucleic acid (e.g., extension nucleotides, polymerase, primer).Inactive globules in the emulsion may include a subset of thesecomponents (e.g., solid support and extension reagents and no samplenucleic acid) and some can be empty (i.e., some globules will include nosolid support, no sample nucleic acid and no extension agents).

Emulsions may be prepared using suitable methods (e.g., Nakano et al.“Single-molecule PCR using water-in-oil emulsion;” Journal ofBiotechnology 102 (2003) 117-124). Emulsification methods includewithout limitation adjuvant methods, counter-flow methods, cross-currentmethods, rotating drum methods, membrane methods, and the like. Incertain embodiments, an aqueous reaction mixture containing a solidsupport (hereafter the “reaction mixture”) is prepared and then added toa biocompatible oil. The reaction mixture can contain (i) a solidsupport or solid support collection; (ii) sample nucleic acid; (iii)extension agents and (iv) one or more primers in certain embodiments.Each of these components can be mixed in any suitable order to preparethe reaction mixture. In certain embodiments, the reaction mixture maybe added dropwise into a spinning mixture of biocompatible oil (e.g.,light mineral oil (Sigma)) and allowed to emulsify. In some embodiments,the reaction mixture may be added dropwise into a cross-flow ofbiocompatible oil. The size of aqueous globules in the emulsion can beadjusted, such as by varying the flow rate and speed at which thecomponents are added to one another, for example.

The size of emulsion globules can be selected based on two competingfactors in certain embodiments: (i) globules are sufficiently large toencompass one solid support molecule, one sample nucleic acid molecule,and sufficient extension agents for the degree of elongation andamplification required; and (ii) globules are sufficiently small so thata population of globules can be amplified by conventional laboratoryequipment (e.g., thermocycling equipment, test tubes, incubators and thelike). Globules in the emulsion can have a nominal, mean or averagediameter of about 5 microns to about 500 microns, about 10 microns toabout 350 microns, about 50 to 250 microns, about 100 microns to about200 microns, or about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65,70, 75, 80, 85, 90, 95, 100, 200, 300, 400 or 500 microns in certainembodiments.

Sample nucleic acid, solid support(s), extension agents and emulsioncomponent(s) can be mixed in any suitable manner and in any suitableratios to carry out the methods described herein, including withoutlimitation, manual and automated means (e.g. biological workstations).Any suitable ratio of solid support to sample nucleic acid can beutilized to obtain globules having one sample nucleic acid per solidsupport unit, and in some embodiments, a ratio of solid supportconcentration to sample nucleic acid concentration is equal to orgreater than 10:1, and in some embodiments, the ratio is about 15, 20,25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 200,300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000,7000, 8000, 9000 or 10000, to 1. In some embodiments, (a) sample nucleicacid may be contacted with a solid support or collection of solidsupports under conditions in which sample nucleic acid can hybridize tosolid phase nucleic acid, (b) the mixture of (a) can be contacted withextension agents, and (c) the mixture of (b) can be emulsified with asolution immiscible with water (e.g., a biocompatible oil). In certainembodiments, an emulsion can be prepared contemporaneously withcontacting the mixture with extension agents.

Hybridization conditions that allow for hybridization of sample nucleicacid to solid phase nucleic acid are known. Non-limiting examples ofhybridization conditions include without limitation, hybridization in 6×sodium chloride/sodium citrate (SSC) at about 45° C., followed by one ormore washes in 0.2× SSC, 0.1% SDS at 50° C. Another example of stringenthybridization conditions are hybridization in 6× sodium chloride/sodiumcitrate (SSC) at about 45° C., followed by one or more washes in 0.2×SSC, 0.1% SDS at 55° C., or 6× sodium chloride/sodium citrate (SSC) atabout 45° C., followed by one or more washes in 0.2× SSC, 0.1% SDS at60° C. Stringent hybridization conditions sometimes are hybridization in6× sodium chloride/sodium citrate (SSC) at about 45° C., followed by oneor more washes in 0.2× SSC, 0.1% SDS at 65° C. Stringency conditions attimes are 0.5M sodium phosphate, 7% SDS at 65° C., followed by one ormore washes at 0.2× SSC, 1% SDS at 65° C. Hybridization conditions alsoare described for example in WO 91/08307, entitled “Enhanced capture oftarget nucleic acid by the use of oligonucleotides covalently attachedto polymers, ” naming Van Ness, and “Nucleic Acid Hybridization, APractical Approach,” Ed. Hames and Higgens, IRL Press, 1985.

Amplification

The term “extension agents” and “extension reagents” as used hereinrefer to components useful for extending a nucleic acid. Conditionsunder which nucleic acids can be extended and/or amplified by suchagents are known. In certain embodiments, extension agents may includeone or more of the following: extension nucleotides, a polymerase and aprimer that can hybridize to a primer sequence in solid phase nucleicacid. Extension nucleotides include, in some embodiments, naturallyoccurring deoxynucleotide triphosphates (dATP, dTTP, dCTP, dGTP, dUTP)and non-naturally occurring nucleotides or nucleotide analogs, such asanalogs containing a detectable label (e.g., fluorescent or colormetriclabel), for example. Polymerases include, in some embodiments,polymerases for thermocycle amplification (e.g., Taq DNA Polymerase;Q-Bio™ Taq DNA Polymerase (recombinant truncated form of Taq DNAPolymerase lacking 5′-3′exo activity); SurePrime™ Polymerase (chemicallymodified Taq DNA polymerase for “hot start” PCR); Arrow™ Taq DNAPolymerase (high sensitivity and long template amplification)) andpolymerases for thermostable amplification (e.g., RNA polymerase fortranscription-mediated amplification (TMA) described at World Wide WebURL “gen-probe.com/pdfs/tma_whiteppr.pdf”). Other enzyme components canbe added, such as reverse transcriptase for TMA reactions, for example.

A primer nucleic acid may be of any length suitable for hybridizing to aprimer hybridization sequence in solid phase nucleic acid and performingsequence analysis processes described herein. A primer in someembodiments may be about 10 to about 100 nucleotides, about 10 to about70 nucleotides, about 10 to about 50 nucleotides, about 15 to about 30nucleotides, or about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or100 nucleotides in length. A primer may be composed of naturallyoccurring and/or non-naturally occurring nucleotides, or a mixturethereof. A primer often includes a nucleotide subsequence that iscomplementary to a solid phase nucleic acid primer hybridizationsequence or substantially complementary to a solid phase nucleic acidprimer hybridization sequence (e.g., about 75%, 76%, 77%, 78%, 79%, 80%,81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%,95%, 96%, 97%, 98%, 99% or greater than 99% identical to the primerhybridization sequence complement when aligned). A primer may contain anucleotide subsequence not complementary to or not substantiallycomplementary to a solid phase nucleic acid primer hybridizationsequence (e.g., at the 3′ or 5′ end of the nucleotide subsequence in theprimer complementary to or substantially complementary to the solidphase primer hybridization sequence). A primer in certain embodiments,may contain a detectable molecule (e.g., a fluorophore, radioisotope,colormetric agent, particle, enzyme and the like).

In processes provided herein, components of a microreactor may becontacted with extension agents under amplification conditions. The term“amplification conditions” as used herein refers to thermocycle andthermostable conditions that can facilitate amplification of a nucleicacid. Thermostable conditions can be maintained and the type and amountof amplification generally is dependent on the extension agents added tothe mixture (e.g., primers, RNA polymerase and reverse transcriptasecomponents for TMA (described above)). Thermocycle conditions generallyinvolve repeating temperature fluctuation cycles, and apparatus foreffecting such cycles are available. A non-limiting example ofthermocycle conditions is treating the sample at 95° C. for 5 minutes;repeating forty-five cycles of 95° C. for 1 minute, 59° C. for 1 minute,10 seconds, and 72° C. for 1 minute 30 seconds; and then treating thesample at 72° C. for 5 minutes. Multiple cycles frequently are performedusing a commercially available thermal cycler (e.g., Applied Biosystems2720 thermal cycler apparatus). In certain embodiments, an emulsifiedmixture may be subjected to thermocycle conditions for linearamplification using one primer that hybridizes to a solid phase nucleicacid primer hybridization sequence.

An amplification product for signal analysis can be of any lengthsuitable for sequence analysis methods. In certain embodiments, anamplification product can be about 5 to about 10,000 nucleotides inlength, about 10 to about 1,000 nucleotides in length, about 10 to about100 nucleotides in length, about 10 to about 50 nucleotides in length,or about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25,30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 200, 300,400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000,8000, 9000 or 10000 nucleotides in length. An amplification product mayinclude naturally occurring nucleotides, non-naturally occurringnucleotides, nucleotide analogs and the like and combinations of theforegoing. An amplification product often has a nucleotide sequence thatis identical to or substantially identical to a sample nucleic acidnucleotide sequence or complement thereof. A “substantially identical”nucleotide sequence in an amplification product will generally have ahigh degree of sequence identity to the sample nucleotide sequence orcomplement thereof (e.g., about 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%,83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%,97%, 98%, 99% or greater than 99% sequence identity), and variationsoften will be a result of infidelity of the polymerase used forextension and/or amplification.

Amplification products (e.g., amplification products of FIG. 2) may becontacted with additional amplification agents and/or subjected tofurther amplification conditions in certain embodiments, such asexponential amplification that involves more than one primer andthermocylcing, for example. Any suitable amplification process mayutilized, such as amplification methods for use with the pyrosequencingand sequencing by ligation methodologies described hereafter, forexample, in certain embodiments.

In certain embodiments, linear amplification products are analyzeddirectly without further amplification by another process (e.g., linearamplification products of FIG. 2 are not amplified further by anexponential amplification process). In some embodiments, sample nucleicacid, extended solid phase nucleic acid and/or amplification productsare not ligated to one or more heterologous nucleic acids (e.g., incontrast to methods described in U.S. Patent Application Publication No.20040110191, entitled “Comparative analysis of nucleic acids usingpopulation tagging,” naming Winkler et al.; U.S. Patent ApplicationPublication No. 20050214825, entitled “Multiplex sample analysis onuniversal arrays,” naming Stuelpnagel; and Nakano et al.“Single-molecule PCR using water-in-oil emulsion”; Journal ofBiotechnology 102 (2003) 117-124), such as heterologous nucleic acidsthat hybridize to an amplification primer. Nucleic acid sequences ofamplification products that are not further amplified may be analyzed bya direct sequence analysis method (e.g., single-molecule sequencingmethodology described hereafter).

Amplification products may be released from a solid support in certainembodiments. A suitable method for releasing an amplification productfrom a solid support can be utilized, such as by heating the solidsupport (e.g., heating to about 95 degrees C.), exposing the solidsupport to an amount of a chaeotrope (e.g., guanidinium HCL) sufficientto release the amplified nucleic acid, and the like, for example, insome embodiments.

FIG. 2 illustrates a process embodiment in which a sample nucleic acidmolecule (S), provided as nucleic acid fragments in some embodiments,can be hybridized to a solid support described herein. Sample nucleicacid molecule (S) may include a nucleotide sequence (i.e., subseq.N)that hybridizes to a complementary probe sequence (P_(N)) of the solidphase nucleic acid. Solid phase nucleic acid hybridized to samplenucleic acid (S) can be extended (e.g., illustrated in FIG. 2 as A) and“extended solid phase nucleic acid” is generated. In some embodimentsextended solid phase nucleic acid may include a nucleotide subsequencecomplementary to a target nucleotide sequence in the sample nucleic acidmolecule. Extended solid phase nucleic acid may be amplified byhybridizing primer (Pr′), which is complementary to primer sequence (Pr)of the extended solid phase nucleic acid, in the presence ofamplification/extension reagents (illustrated in FIG. 2 as B). Thehybridized primer is extended thereby generating an amplificationproduct. The amplification product contains the primer nucleotidesequence (Pr′), a nucleotide sequence complementary to theidentification sequence in solid phase nucleic acid, and a nucleotidesequence identical to or substantially identical to a target nucleotidesequence in the sample nucleic acid molecule. This linear amplificationproduct can be released from the solid support for sequence analysis(e.g., illustrated in FIG. 2 as C).

Sequence Analysis

Amplification products generated by processes described herein may besubject to sequence analysis. The term “sequence analysis” as usedherein refers to determining a nucleotide sequence of an amplificationproduct. The entire sequence or a partial sequence of an amplificationproduct can be determined, and the determined nucleotide sequence isreferred to herein as a “read.” A read may be obtained with or withoutfurther amplification of amplification products resulting from extensionof a primer that hybridizes to the primer hybridization sequence insolid phase nucleic acid. For example, linear amplification products maybe analyzed directly without further amplification in some embodiments(e.g., by using single-molecule sequencing methodology (described ingreater detail hereafter)). In certain embodiments, linear amplificationproducts may be subject to further amplification and then analyzed(e.g., using sequencing by ligation or pyrosequencing methodology(described in greater detail hereafter)). Reads may be subject todifferent types of sequence analysis.

In certain sequence analysis embodiments, reads may be used to constructa larger nucleotide sequence, which can be facilitated by identifyingoverlapping sequences in different reads and by using identificationsequences in the reads. Such sequence analysis methods and software forconstructing larger sequences from reads are known (e.g., Venter et al.,Science 291: 1304-1351 (2001)). Specific reads, partial nucleotidesequence constructs, and full nucleotide sequence constructs may becompared between nucleotide sequences within a sample nucleic acid(i.e., internal comparison) or may be compared with a reference sequence(i.e., reference comparison) in certain sequence analysis embodiments.Internal comparisons sometimes are performed in situations where asample nucleic acid is prepared from multiple samples or from a singlesample source that contains sequence variations. Reference comparisonssometimes are performed when a reference nucleotide sequence is knownand an objective is to determine whether a sample nucleic acid containsa nucleotide sequence that is substantially similar or the same, ordifferent, than a reference nucleotide sequence. Sequence analysis isfacilitated by sequence analysis apparatus and components.

In some embodiments target nucleic acid species can be further analyzedby nucleotide sequencing. Any suitable sequencing method can beutilized. In some embodiments, nucleotide sequencing may be by singlenucleotide sequencing methods and processes. Single nucleotidesequencing methods involve contacting sample nucleic acid and solidsupport under conditions in which a single molecule of sample nucleicacid hybridizes to a single molecule of a solid support. Such conditionscan include providing the solid support molecules and a single moleculeof sample nucleic acid in a “microreactor.” Such conditions also caninclude providing a mixture in which the sample nucleic acid moleculecan hybridize to solid phase nucleic acid on the solid support. Singlenucleotide sequencing methods useful in the embodiments described hereinare described in U.S. Provisional Patent Application Ser. No. 61/021,871filed Jan. 17, 2008, and incorporated herein by reference, in itsentirety.

The terms “sequence analysis apparatus” and “sequence analysiscomponent(s)” used herein refer to apparatus, and one or more componentsused in conjunction with such apparatus, that can be used to determine anucleotide sequence from amplification products resulting from processesdescribed herein (e.g., linear and/or exponential amplificationproducts). Non-limiting examples of current sequence analysis apparatusand components include, without limitation, systems that involve (i)sequencing by ligation of dye-modified probes (e.g., including cyclicligation and cleavage), (ii) pyrosequencing, and (iii) single-moleculesequencing. An amplification product generated by a process describedherein (e.g., released linear amplification product in FIG. 2) can beconsidered a “study nucleic acid” for purposes of analyzing a nucleotidesequence by such sequence analysis apparatus and components. Examples ofsequencing platforms include, without limitation, the 454 platform(Roche) (Margulies, M. et al. 2005 Nature 437, 376-380), IIluminaGenomic Analyzer (or Solexa platform) or SOLID System (AppliedBiosystems) or the Helicos True Single Molecule DNA sequencingtechnology (Harris T D et al. 2008 Science, 320, 106-109), the singlemolecule, real-time (SMRT™) technology of Pacific Biosciences, andnanopore sequencing (Soni G V and Meller A. 2007 Clin Chem 53:1996-2001). Such platforms allow sequencing of many nucleic acidmolecules isolated from a specimen at high orders of multiplexing in aparallel manner (Dear Brief Funct Genomic Proteomic 2003; 1: 397-416).Each of these platforms allow sequencing of clonally expanded ornon-amplified single molecules of nucleic acid fragments.

Sequencing by ligation is another nucleic acid sequencing method.Sequencing by ligation relies on the sensitivity of DNA ligase tobase-pairing mismatch. DNA ligase joins together ends of DNA that arecorrectly base paired. Combining the ability of DNA ligase to jointogether only correctly base paired DNA ends, with mixed pools offluorescently labeled oligonucleotides or primers, enables sequencedetermination by fluorescence detection. Longer sequence reads may beobtained by including primers containing cleavable linkages that can becleaved after label identification. Cleavage at the linker removes thelabel and regenerates the 5′ phosphate on the end of the ligated primer,preparing the primer for another round of ligation. In some embodimentsprimers may be labeled one or more fluorescent labels (e.g., onefluorescent label; 2, 3, or 4 fluorescent labels).

An example of a system that can be used based on sequencing by ligationgenerally involves the following steps. Clonal bead populations can beprepared in emulsion microreactors containing study nucleic acid(“template”), amplification reaction components, beads and primers.After amplification, templates are denatured and bead enrichment isperformed to separate beads with extended templates from undesired beads(e.g., beads with no extended templates). The template on the selectedbeads undergoes a 3′ modification to allow covalent bonding to theslide, and modified beads can be deposited onto a glass slide.Deposition chambers offer the ability to segment a slide into one, fouror eight chambers during the bead loading process. For sequenceanalysis, primers hybridize to the adapter sequence. A set of four colordye-labeled probes compete for ligation to the sequencing primer.Specificity of probe ligation is achieved by interrogating every 4th and5th base during the ligation series. Five to seven rounds of ligation,detection and cleavage record the color at every 5th position with thenumber of rounds determined by the type of library used. Following eachround of ligation, a new complimentary primer offset by one base in the5′ direction is laid down for another series of ligations. Primer resetand ligation rounds (5-7 ligation cycles per round) are repeatedsequentially five times to generate 25-35 base pairs of sequence for asingle tag. With mate-paired sequencing, this process is repeated for asecond tag. Such a system can be used to exponentially amplifyamplification products generated by a process described herein, e.g., byligating a heterologous nucleic acid to the first amplification productgenerated by a process described herein (e.g., a linear amplificationproduct of FIG. 2) and performing emulsion amplification using the sameor a different solid support originally used to generate the firstamplification product. Such a system also may be used to analyzeamplification products directly generated by a process described herein(e.g., a linear amplification product of FIG. 2) by bypassing anexponential amplification process and directly sorting the solidsupports described herein on the glass slide.

Pyrosequencing is a nucleic acid sequencing method based on sequencingby synthesis, which relies on detection of a pyrophosphate released onnucleotide incorporation. Generally, sequencing by synthesis involvessynthesizing, one nucleotide at a time, a DNA strand complimentary tothe strand whose sequence is being sought. Study nucleic acids may beimmobilized to a solid support, hybridized with a sequencing primer,incubated with DNA polymerase, ATP sulfurylase, luciferase, apyrase,adenosine 5′ phosphsulfate and luciferin. Nucleotide solutions aresequentially added and removed. Correct incorporation of a nucleotidereleases a pyrophosphate, which interacts with ATP sulfurylase andproduces ATP in the presence of adenosine 5′ phosphsulfate, fueling theluciferin reaction, which produces a chemiluminescent signal allowingsequence determination.

An example of a system that can be used based on pyrosequencinggenerally involves the following steps: ligating an adaptor nucleic acidto a study nucleic acid and hybridizing the study nucleic acid to abead; amplifying a nucleotide sequence in the study nucleic acid in anemulsion; sorting beads using a picoliter multiwell solid support; andsequencing amplified nucleotide sequences by pyrosequencing methodology(e.g., Nakano et al. “Single-molecule PCR using water-in-oil emulsion;”Journal of Biotechnology 102 (2003) 117-124). Such a system can be usedto exponentially amplify amplification products generated by a processdescribed herein, e.g., by ligating a heterologous nucleic acid to thefirst amplification product generated by a process described herein(e.g., a linear amplification product of FIG. 2) and performing emulsionamplification using the same or a different solid support originallyused to generate the first amplification product. Such a system also maybe used to analyze amplification products directly generated by aprocess described herein (e.g., a linear amplification product of FIG.2) by bypassing an exponential amplification process and directlysorting solid supports described herein on the picoliter multiwellsupport.

Certain single-molecule sequencing embodiments are based on theprincipal of sequencing by synthesis, and utilize single-pairFluorescence Resonance Energy Transfer (single pair FRET) as a mechanismby which photons are emitted as a result of successful nucleotideincorporation. The emitted photons often are detected using intensifiedor high sensitivity cooled charge-couple-devices in conjunction withtotal internal reflection microscopy (TIRM). Photons are only emittedwhen the introduced reaction solution contains the correct nucleotidefor incorporation into the growing nucleic acid chain that issynthesized as a result of the sequencing process. In FRET basedsingle-molecule sequencing, energy is transferred between twofluorescent dyes, sometimes polymethine cyanine dyes Cy3 and Cy5,through long-range dipole interactions. The donor is excited at itsspecific excitation wavelength and the excited state energy istransferred, non-radiatively to the acceptor dye, which in turn becomesexcited. The acceptor dye eventually returns to the ground state byradiative emission of a photon. The two dyes used in the energy transferprocess represent the “single pair”, in single pair FRET. Cy3 often isused as the donor fluorophore and often is incorporated as the firstlabeled nucleotide. Cy5 often is used as the acceptor fluorophore and isused as the nucleotide label for successive nucleotide additions afterincorporation of a first Cy3 labeled nucleotide. The fluorophoresgenerally are within 10 nanometers of each for energy transfer to occursuccessfully.

An example of a system that can be used based on single-moleculesequencing generally involves hybridizing a primer to a study nucleicacid to generate a complex; associating the complex with a solid phase;iteratively extending the primer by a nucleotide tagged with afluorescent molecule; and capturing an image of fluorescence resonanceenergy transfer signals after each iteration (e.g., U.S. Pat. No.7,169,314; Braslaysky et al., PNAS 100(7): 3960-3964 (2003)). Such asystem can be used to directly sequence amplification products generatedby processes described herein (e.g., released linear amplificationproduct in FIG. 2). In some embodiments the released linearamplification product can be hybridized to a primer that containssequences complementary to immobilized capture sequences present on asolid support, a bead or glass slide for example. Hybridization of theprimer—released linear amplification product complexes with theimmobilized capture sequences, immobilizes released linear amplificationproducts to solid supports for single pair FRET based sequencing bysynthesis. The primer often is fluorescent, so that an initial referenceimage of the surface of the slide with immobilized nucleic acids can begenerated. The initial reference image is useful for determininglocations at which true nucleotide incorporation is occurring.Fluorescence signals detected in array locations not initiallyidentified in the “primer only” reference image are discarded asnon-specific fluorescence. Following immobilization of theprimer—released linear amplification product complexes, the boundnucleic acids often are sequenced in parallel by the iterative steps of,a) polymerase extension in the presence of one fluorescently labelednucleotide, b) detection of fluorescence using appropriate microscopy,TIRM for example, c) removal of fluorescent nucleotide, and d) return tostep a with a different fluorescently labeled nucleotide.

Nucleotide sequence analysis can include, in some embodiments, fixingnucleotide sequence information in tangible or electronic form.Nucleotide sequence information includes without limitation one or morenucleotide sequences (e.g., string(s) of nucleotide bases, fullsequences, partial sequences), information pertaining to process(es)used to obtain a sample nucleotide sequence, information pertaining toprocess(es) used to obtain a sample nucleic acid from a sample, andinformation pertaining to the sample(s) from which sample nucleic wasobtained (e.g., patient information, population information, location ofa sample source). Nucleotide sequence information can be fixed in anytangible or electronic form, including without limitation a physicalmedium (e.g., paper and the like) or a computer readable medium (e.g.,optical and/or magnetic storage or transmission medium, floppy disk,hard disk, random access memory, computer processing unit, facsimilesignal, satellite signal, internet, world wide web and the like).Nucleotide sequence information may be fixed in an official orunofficial record (e.g., patient record, insurance record, laboratorynotebook, government record (e.g., Center of Disease Control record) andthe like). Sequence information sometimes is stored and organized in adatabase. In certain embodiments, sequence information may betransferred from one location to another using a physical medium orelectronic medium (e.g., transmission from a site in China to a site inthe United States or a territory thereof).

Kits

Kits often comprise one or more containers that contain one or morecomponents described herein. A kit comprises one or more components inany number of separate containers, packets, tubes, vials, multiwellplates and the like, or components may be combined in variouscombinations in such containers. One or more of the followingcomponents, for example, may be included in a kit: (i) a solid supporthaving solid phase nucleic acid; (i) a collection of solid supportshaving solid phase nucleic acid; (iii) nucleic acid that can beassociated with a solid support to generate a solid support having solidphase nucleic acid; (iv) one or more agents that can be used toassociate nucleic acid with a solid support to generate a solid supporthaving solid phase nucleic acid; (v) nucleic acid-free solid support(s);(vi) one or more extension agents; (vii) one or more components; (viii)emulsion apparatus and/or emulsion component(s); (ix) nucleic acidamplification apparatus and/or nucleic acid amplification component(s);(x) sequence analysis apparatus and/or sequence analysis component(s);(xi) a substrate containing microreactor wells or pits and (xii)nucleotide sequence analysis software.

A kit sometimes is utilized in conjunction with a process, and caninclude instructions for performing one or more processes and/or adescription of one or more compositions. A kit may be utilized to carryout a process (e.g., using a solid support) described herein.Instructions and/or descriptions may be in tangible form (e.g., paperand the like) or electronic form (e.g., computer readable file on atangle medium (e.g., compact disc) and the like) and may be included ina kit insert. A kit also may include a written description of aninternet location that provides such instructions or descriptions.

Applications

Processes and solid supports provided herein are useful for severaltypes of analyses, non-limiting examples of which are describedhereafter.

1. Microbial Identification

A strain or strains of microorganisms can be identified using processesand solid supports described herein. The microorganism(s) are selectedfrom a variety of organisms including, but not limited to, bacteria,fungi, protozoa, ciliates, and viruses. The microorganisms are notlimited to a particular genus, species, strain, subtype or serotype. Themicroorganisms can be identified by determining sequence variations in atarget microorganism sequence relative to one or more referencesequences or samples. The reference sequence(s) can be obtained from,for example, other microorganisms from the same or different genus,species strain or serotype, or from a host prokaryotic or eukaryoticorganism.

Identification and typing of pathogens (e.g., bacterial or viral) iscritical in the clinical management of infectious diseases. Preciseidentity of a microbe is used not only to differentiate a disease statefrom a healthy state, but is also fundamental to determining the sourceof the infection and its spread and whether and which antibiotics orother antimicrobial therapies are most suitable for treatment. Inaddition treatment can be monitored. Traditional methods of pathogentyping have used a variety of phenotypic features, including growthcharacteristics, color, cell or colony morphology, antibioticsusceptibility, staining, smell, serotyping and reactivity with specificantibodies to identify microbes (e.g., bacteria). All of these methodsrequire culture of the suspected pathogen, which suffers from a numberof serious shortcomings, including high material and labor costs, dangerof worker exposure, false positives due to mishandling and falsenegatives due to low numbers of viable cells or due to the fastidiousculture requirements of many pathogens. In addition, culture methodsrequire a relatively long time to achieve diagnosis, and because of thepotentially life-threatening nature of such infections, antimicrobialtherapy is often started before the results can be obtained. Someorganisms cannot be maintained in culture or exhibit prohibitively slowgrowth rates (e.g., up to 6-8 weeks for Mycobacterium tuberculosis).

In many cases, the pathogens are present in minor amounts and/or arevery similar to the organisms that make up the normal flora, and can beindistinguishable from the innocuous strains by the methods cited above.In these cases, determination of the presence of the pathogenic straincan require the higher resolution afforded by the molecular typingmethods provided herein. For example, in some embodiments PCRamplification of a target nucleic acid sequence followed by specificcleavage (e.g., base-specific), followed by matrix-assisted laserdesorption/ionization time-of-flight mass spectrometry, followed byscreening for sequence variations as provided herein, allows reliablediscrimination of sequences differing by only one nucleotide andcombines the discriminatory power of the sequence information generatedwith the speed of MALDI-TOF MS.

Thus, provided herein is a method for detecting a microbial nucleotidesequence in a sample, which comprises (a) providing a sample nucleicacid (e.g., taken from a subject); (b) preparing a mixture of the samplenucleic acid with a solid support described herein having solid phasenucleic acid under conditions in which a single molecule of the samplenucleic acid hybridizes to a solid support molecule; (c) contacting themixture with extension agents under conditions in which solid phasenucleic acid hybridized to sample nucleic acid is extended; (d)amplifying extended solid phase nucleic acid; (e) analyzing the sequenceof the amplified nucleic acid of (d); and (f) based on a sequencedetermined in (e) identifying the presence or absence of the microbialnucleotide sequence in the sample nucleic acid. Part (d) is optional incertain embodiments: the extended solid phase nucleic acid of (c) may beanalyzed by sequencing (e), without amplification (d), using sequencingby synthesis methods described above, for example.

2. Detection of Sequence Variations

Genomic bases of disease and markers thereof can be detected using theprocesses and solid supports herein. The sequence variation candidatesidentified by the methods provided herein include sequences containingsequence variations that are polymorphisms. Polymorphisms include bothnaturally occurring, somatic sequence variations and those arising frommutation. Polymorphisms include but are not limited to: sequencemicrovariants where one or more nucleotides in a localized region varyfrom individual to individual, insertions and deletions which can varyin size from one nucleotides to millions of bases, and microsatellite ornucleotide repeats which vary by numbers of repeats. Nucleotide repeatsinclude homogeneous repeats such as dinucleotide, trinucleotide,tetranucleotide or larger repeats, where the same sequence in repeatedmultiple times, and also heteronucleotide repeats where sequence motifsare found to repeat. For a given locus the number of nucleotide repeatscan vary depending on the individual.

A polymorphic marker or site is the locus at which divergence occurs.Such a site can be as small as one base pair (an SNP). Polymorphicmarkers include, but are not limited to, restriction fragment lengthpolymorphisms (RFLPs), variable number of tandem repeats (VNTR's),hypervariable regions, minisatellites, dinucleotide repeats,trinucleotide repeats, tetranucleotide repeats and other repeatingpatterns, simple sequence repeats and insertional elements, such as Alu.Polymorphic forms also are manifested as different Mendelian alleles fora gene. Polymorphisms can be observed by differences in proteins,protein modifications, RNA expression modification, DNA and RNAmethylation, regulatory factors that alter gene expression and DNAreplication, and any other manifestation of alterations in genomicnucleic acid or organelle nucleic acids.

Furthermore, numerous genes have polymorphic regions. Since individualshave any one of several allelic variants of a polymorphic region,individuals can be identified based on the type of allelic variants ofpolymorphic regions of genes. This can be used, for example, forforensic purposes. In other situations, it is crucial to know theidentity of allelic variants that an individual has. For example,allelic differences in certain genes, for example, majorhistocompatibility complex (MHC) genes, are involved in graft rejectionor graft versus host disease in bone marrow transplantation.Accordingly, it is highly desirable to develop rapid, sensitive, andaccurate methods for determining the identity of allelic variants ofpolymorphic regions of genes or genetic lesions. Method or kitembodiments, as provided herein, can be used to genotype a subject bydetermining the identity of one or more allelic variants of one or morepolymorphic regions in one or more genes or chromosomes of the subject.Genotyping a subject using a method as provided herein can be used forforensic or identity testing purposes and the polymorphic regions can bepresent in mitochondrial genes or can be short tandem repeats.

Single nucleotide polymorphisms (SNPs) are generally biallelic systems,that is, there are two alleles that an individual can have for anyparticular marker. This means that the information content per SNPmarker is relatively low when compared to microsatellite markers, whichcan have upwards of 10 alleles. SNPs also tend to be verypopulation-specific; a marker that is polymorphic in one populationsometimes is not very polymorphic in another. SNPs, found approximatelyevery kilobase (see Wang et al. (1998) Science 280:1077-1082), offer thepotential for generating very high density genetic maps, which will beextremely useful for developing haplotyping systems for genes or regionsof interest, and because of the nature of SNPS, they can in fact be thepolymorphisms associated with the disease phenotypes under study. Thelow mutation rate of SNPs also makes them excellent markers for studyingcomplex genetic traits.

Much of the focus of genomics has been on the identification of SNPs,which are important for a variety of reasons. SNP's allow indirecttesting (association of haplotypes) and direct testing (functionalvariants). SNP's are the most abundant and stable genetic markers.Common diseases are best explained by common genetic alterations, andthe natural variation in the human population aids in understandingdisease, therapy and environmental interactions.

Thus, provided herein is a method for detecting a sequence variation insample nucleic acid, which comprises (a) providing a nucleic acid from asample; (b) preparing a mixture of the sample nucleic acid with a solidsupport described herein having solid phase nucleic acid underconditions in which a single molecule of sample nucleic acid hybridizesto a solid support molecule; (c) contacting the mixture with extensionagents under conditions in which solid phase nucleic acid hybridized tosample nucleic acid is extended; (d) amplifying extended solid phasenucleic acid; (e) analyzing the sequence of the amplified nucleic acidof (d); and (f) based on a sequence determined in (e), identifying thepresence or absence of the disease marker nucleotide sequence in thesample nucleic acid. The sample may be processed before step (b), bypurifying the nucleic acid in the sample and/or fragmenting the samplenucleic acid, for example. Part (d) is optional in certain embodiments:the extended solid phase nucleic acid of (c) may be analyzed bysequencing (e), without amplification (d), using sequencing by synthesismethods described above, for example.

3. Detecting the Presence of Microbial Nucleic Acid Sequences Indicativeof an Infection

Processes and solid supports described herein can be used to determinethe presence of microbial nucleic acid sequences indicative of aninfection by identifying sequence variations that are present in theviral or bacterial nucleic acid sequences relative to one or morereference sequences. The reference sequence(s) can include, but are notlimited to, sequences obtained from related non-infectious organisms, orsequences from host organisms.

Viruses, bacteria, fungi and other infectious organisms contain distinctnucleic acid sequences, including sequence variants, which are differentfrom the sequences contained in the host cell, and in some instancesdifferent from the sequences of related species, subspecies, serotypes,and the like, which may form part of the normal flora or fauna of thehost. A target DNA sequence can be part of a foreign genetic sequencesuch as the genome of an invading microorganism, including, for example,bacteria and their phages, viruses, fungi, protozoa, and the like. Theprocesses provided herein are particularly applicable for distinguishingbetween different variants or strains of a microorganism (e.g.,pathogenic, less pathogenic, resistant versus non-resistant and thelike) in order, for example, to choose an appropriate therapeuticintervention. Examples of disease-causing viruses that infect humans andanimals and that can be detected by a disclosed process include but arenot limited to Retroviridae (e.g., human immunodeficiency viruses suchas HIV-1 (also referred to as HTLV-III, LAV or HTLV-III/LAV; Ratner etal., Nature, 313:227-284 (1985); Wain Hobson et al., Cell, 40:9-17(1985), HIV-2 (Guyader et al., Nature, 328:662-669 (1987); EuropeanPatent Publication No. 0 269 520; Chakrabarti et al., Nature,328:543-547 (1987); European Patent Application No. 0 655 501), andother isolates such as HIV-LP (International Publication No. WO94/00562); Picornaviridae (e.g., polioviruses, hepatitis A virus, (Gustet al., Intervirology, 20:1-7 (1983)); enteroviruses, human coxsackieviruses, rhinoviruses, echoviruses); Calcivirdae (e.g. strains thatcause gastroenteritis); Togaviridae (e.g., equine encephalitis viruses,rubella viruses); Flaviridae (e.g., dengue viruses, encephalitisviruses, yellow fever viruses); Coronaviridae (e.g., coronaviruses);Rhabdoviridae (e.g., vesicular stomatitis viruses, rabies viruses);Filoviridae (e.g., ebola viruses); Paramyxoviridae (e.g., parainfluenzaviruses, mumps virus, measles virus, respiratory syncytial virus);Orthomyxoviridae (e.g., influenza viruses); Bungaviridae (e.g., Hantaanviruses, bunga viruses, phleboviruses and Nairo viruses); Arenaviridae(hemorrhagic fever viruses); Reoviridae (e.g., reoviruses, orbivirusesand rotaviruses); Birnaviridae; Hepadnaviridae (Hepatitis B virus);Parvoviridae (parvoviruses); Papovaviridae; Hepadnaviridae (Hepatitis Bvirus); Parvoviridae (most adenoviruses); Papovaviridae (papillomaviruses, polyoma viruses); Adenoviridae (most adenoviruses);Herpesviridae (herpes simplex virus type 1 (HSV-1) and HSV-2, varicellazoster virus, cytomegalovirus, herpes viruses; Poxviridae (variolaviruses, vaccinia viruses, pox viruses); Iridoviridae (e.g., Africanswine fever virus); and unclassified viruses (e.g., the etiologicalagents of Spongiform encephalopathies, the agent of delta hepatitis(thought to be a defective satellite of hepatitis B virus), the agentsof non-A, non-B hepatitis (class 1=internally transmitted; class2=parenterally transmitted, e.g., Hepatitis C); Norwalk and relatedviruses, and astroviruses.

Examples of infectious bacteria include but are not limited toHelicobacter pyloris, Borelia burgdorferi, Legionella pneumophilia,Mycobacteria sp. (e.g. M. tuberculosis, M. avium, M. intracellulare, M.kansaii, M. gordonae), Salmonella, Staphylococcus aureus, Neisseriagonorrheae, Neisseria meningitidis, Listeria monocytogenes,Streptococcus pyogenes (Group A Streptococcus), Streptococcus agalactiae(Group B Streptococcus), Streptococcus sp. (viridans group),Streptococcus faecalis, Streptococcus bovis, Streptococcus sp.(anaerobic species), Streptococcus pneumoniae, pathogenic Campylobactersp., Enterococcus sp., Haemophilus influenzae, Bacillus antracis,Corynebacterium diphtheriae, Corynebacterium sp., Erysipelothrixrhusiopathiae, Clostridium perfringens, Clostridium tetani, Escherichiacoli, Enterobacter aerogenes, Klebsiella pneumoniae, Pasturellamultocida, Bacteroides sp., Fusobacterium nucleatum, Streptobacillusmoniliformis, Treponema pallidium, Treponema pertenue, Leptospira, andActinomyces israelli.

Examples of infectious fungi include but are not limited to Cryptococcusneoformans, Histoplasma capsulatum, Coccidioides immitis, Blastomycesdermatitidis, Chlamydia trachomatis, Candida albicans. Other infectiousorganisms include protists such as Plasmodium falciparum and Toxoplasmagondii.

Thus, provided herein is a method for detecting an infectious microbialnucleotide sequence in a sample, which comprises (a) providing a samplenucleic acid (e.g., taken from a subject); (b) preparing a mixture ofthe sample nucleic acid with a solid support described herein havingsolid phase nucleic acid under conditions in which a single molecule ofthe sample nucleic acid hybridizes to a solid support molecule; (c)contacting the mixture with extension agents under conditions in whichsolid phase nucleic acid hybridized to sample nucleic acid is extended;(d) amplifying extended solid phase nucleic acid; (e) analyzing thesequence of the amplified nucleic acid of (d); and (f) based on asequence determined by part (e), identifying the presence or absence ofthe infectious microbial nucleotide sequence in the sample nucleic acid.The microbial nucleotide sequence may be compared to a referencemicrobial sequence in certain embodiments, and may be used, sometimes inconjunction with other information, to diagnose an infection of thesubject. In some embodiments the sample may be processed before step(b), by purifying the nucleic acid in the sample and/or fragmenting thesample nucleic acid, for example. Part (d) is optional in certainembodiments: the extended solid phase nucleic acid of (c) may beanalyzed by sequencing (e), without amplification (d), using sequencingby synthesis methods described above.

4. Detecting the Presence of Specific Viral Nucleic Acid Sequences in aViral Mixture or Mixed Viral Population

Processes and solid supports described herein can be used to determinethe presence of specific viral nucleic acid sequences in a mixture ofviral sequences or mixed population of viral sequences (e.g., forexample a “homogenous” mixture containing only virus from the samegenus, or “heterogeneous” populations as might be found in a sampletaken from an environmental source or an immuno-deficient organism).

Recent evidence suggests that “viral mixtures” (e.g., mixed populationsof virus from either the same or different genus and species,subspecies, cultivar and the like, hepatitis A, B, and C, for example),may lead to increased occurrence of certain diseases, cancer forexample. This is particularly evident with hepatitis B virus (HBV)infection. An increase in the severity of the course of the disease andan increase in the reoccurrence of Hepatocellular carcinoma were seen inindividuals co-infected with two or three subgenotypes of HBV,particularly subgenotypes C2 and B2 (Yin et al., “Role of hepatitis Bvirus genotype mixture, subgenotypes C2 and B2 on hepatocellularcarcinoma: compared with chronic hepatitis B and asymptomatic carrierstate in the same area” Carcinogenesis, 29(9): 1685-1691, 2008). Thegenetic variability of RNA viruses is also known in certain instances.This genetic variability, for example as seen in the HumanImmunodeficiency Virus (HIV), has led to the discovery of“quasi-species” or mixed viral populations, with an increase in drugresistant forms of HIV discovered as a result of recombination betweendifferent HIV genotypes, due to anti-retroviral selective pressures.

Mixed viral populations are naturally occurring. It is estimated thatoceans of the world contain greater than 22 metric tons of phage andvirus particles (e.g., greater than 10³¹ particles, Rohwer and Edwards,“The phage proteomic tree: a genome based taxonomy for phage”, Journalof Bacteriology, 184:4529-4535, 2002), some of which are known to behuman pathogens (Griffin et al., Pathogenic human viruses in coastalwaters. Clinical Microbiol. Rev. 16:129-143, 2003). Oceans of the worldmay have an average viral content in the lower range of the viral loadsreported for human plasma from viremic patients. Even assuming minimalmutation and recombination rates, the equivalent of hundreds orthousands of complete “human genomes” worth of new genetic sequences arecreated daily. Early identification of potentially new pathogenicsequences using methods described herein to detect specific viralsequences in viral mixtures or mixed viral populations, may provecrucial to developing new and effective treatments. Further, viralpopulations are present in human populations and environmental samples,and can be assessed by processes and compositions described herein.

Processes provided herein are particularly applicable for distinguishingbetween different variants or strains, genotype, or subgenotypes ofviruses (e.g., pathogenic, less pathogenic, resistant versusnon-resistant and the like) in order, for example, to choose anappropriate therapeutic intervention. Examples of disease-causingviruses that infect humans and animals and that can be detected by adisclosed process include but are not limited to Retroviridae (e.g.,human immunodeficiency viruses such as HIV-1 (also referred to asHTLV-III, LAV or HTLV-III/LAV; Ratner et al., Nature, 313:227-284(1985); Wain Hobson et al., Cell, 40:9-17 (1985), HIV-2 (Guyader et al.,Nature, 328:662-669 (1987); European Patent Publication No. 0 269 520;Chakrabarti et al., Nature, 328:543-547 (1987); European PatentApplication No. 0 655 501), and other isolates such as HIV-LP(International Publication No. WO 94/00562); Picornaviridae (e.g.,polioviruses, hepatitis A virus, (Gust et al., Intervirology, 20:1-7(1983)); enteroviruses, human coxsackie viruses, rhinoviruses,echoviruses); Calcivirdae (e.g. strains that cause gastroenteritis);Togaviridae (e.g., equine encephalitis viruses, rubella viruses);Flaviridae (e.g., dengue viruses, encephalitis viruses, yellow feverviruses); Coronaviridae (e.g., coronaviruses); Rhabdoviridae (e.g.,vesicular stomatitis viruses, rabies viruses); Filoviridae (e.g., ebolaviruses); Paramyxoviridae (e.g., parainfluenza viruses, mumps virus,measles virus, respiratory syncytial virus); Orthomyxoviridae (e.g.,influenza viruses); Bungaviridae (e.g., Hantaan viruses, bunga viruses,phleboviruses and Nairo viruses); Arenaviridae (hemorrhagic feverviruses); Reoviridae (e.g., reoviruses, orbiviruses and rotaviruses);Birnaviridae; Hepadnaviridae (Hepatitis B virus); Parvoviridae(parvoviruses); Papovaviridae; Hepadnaviridae (Hepatitis B virus);Parvoviridae (most adenoviruses); Papovaviridae (papilloma viruses,polyoma viruses); Adenoviridae (most adenoviruses); Herpesviridae(herpes simplex virus type 1 (HSV-1) and HSV-2, varicella zoster virus,cytomegalovirus, herpes viruses; Poxviridae (variola viruses, vacciniaviruses, pox viruses); Iridoviridae (e.g., African swine fever virus);and unclassified viruses (e.g., the etiological agents of Spongiformencephalopathies, the agent of delta hepatitis (thought to be adefective satellite of hepatitis B virus), the agents of non-A, non-Bhepatitis (class 1=internally transmitted; class 2=parenterallytransmitted, e.g., Hepatitis C); Norwalk and related viruses, andastroviruses. In certain embodiments the processes provided herein maybe used to detect hepatitis B nucleic acid sequences in a mixture ofHepadnaviridae sequences.

Thus, provided herein is a method for detecting specific viralnucleotide sequences in a viral mixture or mixed viral populationsample, which comprises (a) providing a sample nucleic acid (e.g., takenfrom a subject or ocean); (b) preparing a mixture of the sample nucleicacid with a solid support described herein having solid phase nucleicacid under conditions in which a single molecule of the sample nucleicacid hybridizes to a solid support molecule; (c) contacting the mixturewith extension agents under conditions in which solid phase nucleic acidhybridized to sample nucleic acid is extended; (d) amplifying extendedsolid phase nucleic acid; (e) analyzing the sequence of the amplifiednucleic acid of (d); and (f) based on a sequence determined by part (e),identifying the presence or absence of the viral nucleotide sequence inthe sample nucleic acid. The viral nucleotide sequence may be comparedto a reference viral sequence in certain embodiments, and may be used,sometimes in conjunction with other information, to detect the presenceof a specific viral nucleic acid sequence, or for example to diagnose aninfection of a subject. In some embodiments the sample may be processedbefore step (b), by purifying the nucleic acid in the sample and/orfragmenting the sample nucleic acid, for example. Part (d) is optionalin certain embodiments: the extended solid phase nucleic acid of (c) maybe analyzed by sequencing (e), without amplification (d), usingsequencing by synthesis methods described above.

5. Antibiotic Profiling

Processes and solid supports described herein can be utilized toidentify nucleotide changes involved in drug resistance, includingantibiotic resistance. Genetic loci involved in resistance to isoniazid,rifampin, streptomycin, fluoroquinolones, and ethionamide have beenidentified [Heym et al., Lancet 344:293 (1994) and Morris et al., J.Infect. Dis. 171:954 (1995)]. A combination of isoniazid (inh) andrifampin (rif) along with pyrazinamide and ethambutol or streptomycin,is routinely used as the first line of attack against confirmed cases ofM. tuberculosis [Banerjee et al., Science 263:227 (1994)]. Theincreasing incidence of such resistant strains necessitates thedevelopment of rapid assays to detect them and thereby reduce theexpense and community health hazards of pursuing ineffective, andpossibly detrimental, treatments. The identification of some of thegenetic loci involved in drug resistance has facilitated the adoption ofmutation detection technologies for rapid screening of nucleotidechanges that result in drug resistance. In addition, the technologyfacilitates treatment monitoring and tracking or microbial populationstructures.

Thus, in some embodiments the target nucleotide sequence identified maybe (i) a genetic locus mutated as a result of an organism (e.g., thesequence will be present if a drug-resistant organism is present); (ii)a genetic locus that does not change as a result of drug resistance(e.g., such a sequence from a pathogen will diminish over time if thedrug depletes the organism); (iii) a nucleotide sequence from aparticular strain not resistant to the drug. Accordingly, providedherein is a method for determining the presence of drug resistance,which comprises (a) providing a nucleic acid from a sample; (b)preparing a mixture of the sample nucleic acid with a solid supportdescribed herein having solid phase nucleic acid under conditions inwhich a single molecule of the sample nucleic acid hybridizes to a solidsupport molecule; (c) contacting the mixture with extension agents underconditions in which solid phase nucleic acid hybridized to samplenucleic acid is extended; (d) amplifying extended solid phase nucleicacid; (e) analyzing the sequence of the amplified nucleic acid of (d);and (f) based on a sequence determined in (e), detecting the presence orabsence of a target nucleic acid sequence indicative of drug resistance.The presence of a sequence indicative of resistance to a first drug maybe identified, and an alternative drug may be prescribed. In someembodiments the sample may be processed before step (b), by purifyingthe nucleic acid in the sample and/or fragmenting the sample nucleicacid, for example. Part (d) is optional in certain embodiments: theextended solid phase nucleic acid of (c) may be analyzed by sequencing(e), without amplification (d), using sequencing by synthesis methodsdescribed above.

6. Identifying Disease Markers

Processes and solid supports described herein can be utilized to rapidlyand accurately identify sequence variations that are genetic markers ofdisease, which can be used to diagnose or determine the prognosis of adisease. Diseases characterized by genetic markers can include, but arenot limited to, atherosclerosis, obesity, diabetes, autoimmunedisorders, and cancer. Diseases in all organisms have a geneticcomponent, whether inherited or resulting from the body's response toenvironmental stresses, such as viruses and toxins. The ultimate goal ofongoing genomic research is to use this information to develop new waysto identify, treat and potentially cure these diseases. The first stephas been to screen disease tissue and identify genomic changes at thelevel of individual samples. The identification of these “disease”markers is dependent on the ability to detect changes in genomic markersin order to identify errant genes or sequence variants. Genomic markers(all genetic loci including single nucleotide polymorphisms (SNPs),microsatellites and other noncoding genomic regions, tandem repeats,introns and exons) can be used for the identification of all organisms,including humans. These markers provide a way to not only identifypopulations but also allow stratification of populations according totheir response to disease, drug treatment, resistance to environmentalagents, and other factors.

Thus, provided herein is a method for detecting a disease markernucleotide sequence, which comprises (a) providing a nucleic acid from asample; (b) preparing a mixture of the sample nucleic acid with a solidsupport described herein having solid phase nucleic acid underconditions in which a single molecule of the sample nucleic acidhybridizes to a solid support molecule; (c) contacting the mixture withextension agents under conditions in which solid phase nucleic acidhybridized to sample nucleic acid is extended; (d) amplifying extendedsolid phase nucleic acid; (e) analyzing the sequence of the amplifiednucleic acid of (d); and (f) based on a sequence determined by part (e),identifying the presence or absence of the disease marker nucleotidesequence in the sample nucleic acid. In some embodiments the sample maybe processed before step (b), by purifying the nucleic acid in thesample and/or fragmenting the sample nucleic acid, for example. Part (d)is optional in certain embodiments: the extended solid phase nucleicacid of (c) may be analyzed by sequencing (e), without amplification(d), using sequencing by synthesis methods described above.

7. Haplotyping

Processes and solid supports described herein can be used to detecthaplotypes. In any diploid cell, there are two haplotypes at any gene orother chromosomal segment that contain at least one distinguishingvariance. In many well-studied genetic systems, haplotypes are morepowerfully correlated with phenotypes than single nucleotide variations.Thus, the determination of haplotypes is valuable for understanding thegenetic basis of a variety of phenotypes including diseasepredisposition or susceptibility, response to therapeutic interventions,and other phenotypes of interest in medicine, animal husbandry, andagriculture.

Haplotyping procedures as provided herein permit the selection of aportion of sequence from one of an individual's two homologouschromosomes and to genotype linked SNPs on that portion of sequence. Thedirect resolution of haplotypes can yield increased information content,improving the diagnosis of any linked disease genes or identifyinglinkages associated with those diseases.

Thus, provided herein is a method for identifying a haplotypescomprising two or more nucleotides, which comprises (a) providing anucleic acid from a sample, wherein the nucleic acid is from onechromosome of a diploid organism; (b) preparing a mixture of the samplenucleic acid with a solid support described herein having solid phasenucleic acid under conditions in which a single molecule of the samplenucleic acid hybridizes to a solid support molecule; (c) contacting themixture with extension agents under conditions in which solid phasenucleic acid hybridized to sample nucleic acid is extended; (d)amplifying extended solid phase nucleic acid; (e) analyzing the sequenceof the amplified nucleic acid of (d); and (f) based on a sequencedetermined by part (e), determining the haplotype in the sample nucleicacid. In some embodiments the sample may be processed before step (b),by purifying the nucleic acid in the sample and/or fragmenting thesample nucleic acid, for example. Part (d) is optional in certainembodiments: the extended solid phase nucleic acid of (c) may beanalyzed by sequencing (e), without amplification (d), using sequencingby synthesis methods described above.

8. Microsatellites

Processes and solid supports described herein allow for rapid,unambiguous detection of sequence variations that are microsatellites.Microsatellites (sometimes referred to as variable number of tandemrepeats or VNTRs) are short tandemly repeated nucleotide units of one toseven or more bases, the most prominent among them being di-, tri-, andtetranucleotide repeats. Microsatellites are present every 100,000 by ingenomic DNA (J. L. Weber and P. E. Can, Am. J. Hum. Genet. 44, 388(1989); J. Weissenbach et al., Nature 359, 794 (1992)). CA dinucleotiderepeats, for example, make up about 0.5% of the humanextra-mitochondrial genome; CT and AG repeats together make up about0.2%. CG repeats are rare, most probably due to the regulatory functionof CpG islands. Microsatellites are highly polymorphic with respect tolength and widely distributed over the whole genome with a mainabundance in non-coding sequences, and their function within the genomeis unknown.

Microsatellites are important in forensic applications, as a populationwill maintain a variety of microsatellites characteristic for thatpopulation and distinct from other populations, which do not interbreed.Many changes within microsatellites can be silent, but some can lead tosignificant alterations in gene products or expression levels. Forexample, trinucleotide repeats found in the coding regions of genes areaffected in some tumors (C. T. Caskey et al., Science 256, 784 (1992)and alteration of the microsatellites can result in a geneticinstability that results in a predisposition to cancer (P. J. McKinnen,Hum. Genet. 1 75, 197 (1987); J. German et al., Clin. Genet. 35, 57(1989)).

Thus, provided herein is a method for detecting a microsatellitesequence, which comprises (a) providing a nucleic acid from a sample;(b) preparing a mixture of the sample nucleic acid with a solid supportdescribed herein having solid phase nucleic acid under conditions inwhich a single molecule of the sample nucleic acid hybridizes to a solidsupport molecule; (c) contacting the mixture with extension agents underconditions in which solid phase nucleic acid hybridized to samplenucleic acid is extended; (d) amplifying extended solid phase nucleicacid; (e) analyzing the sequence of the amplified nucleic acid of (e);and (f) based on a sequence determined by part (d), determining whetherthe microsatellite sequence is present in the sample nucleic acid. Amicrosatellite sequence may be a full microsatellite sequence or aportion of a full microsatellite sequence. In some embodiments thesample may be processed before step (b), by purifying the nucleic acidin the sample and/or fragmenting the sample nucleic acid, for example.Part (d) is optional in certain embodiments: the extended solid phasenucleic acid of (c) may be analyzed by sequencing (e), withoutamplification (d), using sequencing by synthesis methods describedabove.

9. Short Tandem Repeats

Processes and solid supports described herein can be used to identifyshort tandem repeat (STR) regions in some target sequences of the humangenome relative to, for example, reference sequences in the human genomethat do not contain STR regions. STR regions are polymorphic regionsthat are not related to any disease or condition. Many loci in the humangenome contain a polymorphic short tandem repeat (STR) region. STR locicontain short, repetitive sequence elements of 3 to 7 base pairs inlength. It is estimated that there are 200,000 expected trimeric andtetrameric STRs, which are present as frequently as once every 15 kb inthe human genome (see, e.g., International PCT application No. WO9213969 Al, Edwards et al., Nucl. Acids Res. 19:4791 (1991); Beckmann etal. (1992) Genomics 12:627-631). Nearly half of these STR loci arepolymorphic, providing a rich source of genetic markers. Variation inthe number of repeat units at a particular locus is responsible for theobserved sequence variations reminiscent of variable nucleotide tandemrepeat (VNTR) loci (Nakamura et al. (1987) Science 235:1616-1622); andminisatellite loci (Jeffreys et al. (1985) Nature 314:67-73), whichcontain longer repeat units, and microsatellite or dinucleotide repeatloci (Luty et al. (1991) Nucleic Acids Res. 19:4308; Litt et al. (1990)Nucleic Acids Res. 18:4301; Litt et al. (1990) Nucleic Acids Res.18:5921; Luty et al. (1990) Am. J. Hum. Genet. 46:776-783; Tautz (1989)Nucl. Acids Res. 17:6463-6471; Weber et al. (1989) Am. J. Hum. Genet.44:388-396; Beckmann et al. (1992) Genomics 12:627-631).

Examples of STR loci include, but are not limited to, pentanucleotiderepeats in the human CD4 locus (Edwards et al., Nucl. Acids Res. 19:4791(1991)); tetranucleotide repeats in the human aromatase cytochrome P-450gene (CYP19; Polymeropoulos et al., Nucl. Acids Res. 19:195 (1991));tetranucleotide repeats in the human coagulation factor XIII A subunitgene (F13A1; Polymeropoulos et al., Nucl. Acids Res. 19:4306 (1991));tetranucleotide repeats in the F13B locus (Nishimura et al., Nucl. AcidsRes. 20:1167 (1992)); tetranucleotide repeats in the human c-les/fps,proto-oncogene (FES; Polymeropoulos et al., Nucl. Acids Res. 19:4018(1991)); tetranucleotide repeats in the LFL gene (Zuliani et al., Nucl.Acids Res. 18:4958 (1990)); trinucleotide repeat sequence variations atthe human pancreatic phospholipase A-2 gene (PLA2; Polymeropoulos etal., Nucl. Acids Res. 18:7468 (1990)); tetranucleotide repeat sequencevariations in the VWF gene (Ploos et al., Nucl. Acids Res. 18:4957(1990)); and tetranucleotide repeats in the human thyroid peroxidase(hTP0) locus (Anker et al., Hum. Mol. Genet. 1:137 (1992)).

Thus, provided herein is a method for detecting a short tandem repeatsequence, which comprises (a) providing a nucleic acid from a sample;(b) preparing a mixture of the sample nucleic acid with a solid supportdescribed herein having solid phase nucleic acid under conditions inwhich a single molecule of the sample nucleic acid hybridizes to a solidsupport molecule; (c) contacting the mixture with extension agents underconditions in which solid phase nucleic acid hybridized to samplenucleic acid is extended; (d) amplifying extended solid phase nucleicacid; (e) analyzing the sequence of the amplified nucleic acid of (d);and (f) based on a sequence determined in (e), determining whether theshort tandem repeat sequence is present in the sample nucleic acid. Ashort tandem repeat sequence may be a full STR sequence or a portion ofa full STR sequence. In some embodiments the sample may be processedbefore step (b), by purifying the nucleic acid in the sample and/orfragmenting the sample nucleic acid, for example. Part (d) is optionalin certain embodiments: the extended solid phase nucleic acid of (c) maybe analyzed by sequencing (e), without amplification (d), usingsequencing by synthesis methods described above.

10. Organism Identification

Processes and solid supports described herein can be utilized toidentify polymorphic STR loci and other polymorphic regions useful fordiscriminating one organism from another. Certain polymorphic STR lociand other polymorphic regions of genes are sequence variations that areuseful markers for human identification, paternity and maternitytesting, genetic mapping, immigration and inheritance disputes, zygositytesting in twins, tests for inbreeding in humans, quality control ofhuman cultured cells, identification of human remains, and testing ofsemen samples, blood stains, microbes and other material in forensicmedicine. Such loci also are useful markers in commercial animalbreeding and pedigree analysis and in commercial plant breeding. Traitsof economic importance in plant crops and animals can be identifiedthrough linkage analysis using polymorphic DNA markers. Efficient andaccurate methods for determining the identity of such loci are providedherein.

Thus, provided herein is a method for detecting a target nucleotidesequence of one organism, which comprises (a) providing a nucleic acidfrom a sample; (b) preparing a mixture of the sample nucleic acid with asolid support described herein having solid phase nucleic acid underconditions in which a single molecule of the sample nucleic acidhybridizes to a solid support molecule; (c) contacting the mixture withextension agents under conditions in which solid phase nucleic acidhybridized to sample nucleic acid is extended; (d) amplifying extendedsolid phase nucleic acid; (e) analyzing the sequence of the amplifiednucleic acid of (d); and (f) based on a sequence determined in (e),determining whether the target nucleotide sequence is present. If thepresence of a first organism that resides in a second organism is beingdetected, a target nucleotide sequence present in nucleic acid from thefirst organism that is not present in nucleic acid of the secondorganism generally is selected (e.g., a nucleotide sequence in apathogen nucleic acid that is not present in a human nucleic acid; anucleotide sequence in a fetus nucleic acid that is not present in amaternal nucleic acid). In some embodiments the sample may be processedbefore step (b), by purifying the nucleic acid in the sample and/orfragmenting the sample nucleic acid, for example. Part (d) is optionalin certain embodiments: the extended solid phase nucleic acid of (c) maybe analyzed by sequencing (e), without amplification (d), usingsequencing by synthesis methods described above.

11. Detecting Allelic Variation

Processes and solid supports described herein allow for high-throughput,fast and accurate detection of allelic variants. Human populations areheterogeneous in terms of susceptibility to particular diseases orresponses to therapeutic interventions. Increasing evidence suggeststhat allelic variation in gene expression is a widespread phenomenon,and may contribute to phenotypic variation between individuals. As moregenomes are sequenced, the identification and characterization of thecauses of heritable variation within a species will be increasinglyimportant. Allelic variation can occur be observed between ethnic orregional populations, and within ethnic and regional populations. Insome instances intra-population variation can be found within relativelysmall populations. Heritable allelic variation in gene expression maycontribute to sporadic and familial disease, but is relativelyunexplored. Understanding allelic variation may help provide insightinto a number of genetic heterogeneity phenomena, including but notlimited to genetic imprinting, disease susceptibility and therapeuticresponse.

Studies of allelic variation involve not only detection of a specificsequence in a complex background, but also the discrimination betweensequences with few, or single, nucleotide differences. Allelicvariations studies can be performed on DNA or RNA, thus correlations canbe made between allelic variants, SNP's and expression levels. Onemethod for detecting the degree of variation in allelic expression atany specific locus is to quantitatively genotype mRNA from individualsheterozygous for an exonic single nucleotide polymorphism (SNP) in thegene of interest. If there is no allelic variation in gene expressionthen the two alleles of the SNP should be expressed at the same level,but where there is allelic differential expression one allele will befound at a higher level than the other.

Thus, provided herein is a method for detecting a sequence variation ina target nucleotide sequence, which comprises (a) providing a nucleicacid from a sample; (b) preparing a mixture of the sample nucleic acidwith a solid support described herein having solid phase nucleic acidunder conditions in which a single molecule of the sample nucleic acidhybridizes to a solid support molecule; (c) contacting the mixture withextension agents under conditions in which solid phase nucleic acidhybridized to sample nucleic acid is extended; (d) amplifying extendedsolid phase nucleic acid; (e) analyzing the sequence of the amplifiednucleic acid of (d); and (f) based on a sequence determined in (e),determining whether a sequence variation in the target nucleotidesequence is present. In some embodiments the sample may be processedbefore step (b), by purifying the nucleic acid in the sample and/orfragmenting the sample nucleic acid, for example. Part (d) is optionalin certain embodiments: the extended solid phase nucleic acid of (c) maybe analyzed by sequencing (e), without amplification (d), usingsequencing by synthesis methods described above.

12. Determining Allelic Frequency

Processes and solid supports described herein are useful for identifyingone or more genetic markers whose frequency changes within thepopulation as a function of age, ethnic group, sex or some othercriteria. For example, the age-dependent distribution of ApoE genotypesis known in the art (see, Schechter et al. (1994) Nature Genetics6:29-32). The frequencies of sequence variations known to be associatedat some level with disease can also be used to detect or monitorprogression of a disease state. For example, the N291 S polymorphism(N291 S) of the Lipoprotein Lipase gene, which results in a substitutionof a serine for an asparagine at amino acid codon 291, leads to reducedlevels of high density lipoprotein cholesterol (HDL-C) that isassociated with an increased risk of males for arteriosclerosis and inparticular myocardial infarction (see, Reymer et al. (1995) NatureGenetics 10:28-34). In addition, determining changes in allelicfrequency can allow the identification of previously unknown sequencevariations and ultimately a gene or pathway involved in the onset andprogression of disease.

Thus, provided herein is a method for determining the frequency of atarget nucleotide sequence in a population of individuals, whichcomprises (a) providing a sample nucleic acid (e.g., taken from asubject); (b) preparing a mixture of the sample nucleic acid with asolid support described herein having solid phase nucleic acid underconditions in which a single molecule of the sample nucleic acidhybridizes to a solid support molecule; (c) contacting the mixture withextension agents under conditions in which solid phase nucleic acidhybridized to sample nucleic acid is extended; (d) amplifying theextended nucleic acid; (e) analyzing the sequence of the amplifiednucleic acid; (f) identifying the presence or absence of the targetnucleotide sequence according in (e); and (g) repeating steps (a) to (f)for other individuals of the population and determining the frequency ofthe target nucleotide sequence in the population. In some embodimentsthe sample may be processed before step (b), by purifying the nucleicacid in the sample and/or fragmenting the sample nucleic acid, forexample. In some embodiments methylated nucleotides in the samplenucleic acid may be converted to another nucleotide using methods knownin the art, such as bisulfite conversion of methylated cytosine touracil, for example. Part (d) is optional in certain embodiments: theextended solid phase nucleic acid of (c) may be analyzed by sequencing(e), without amplification (d), using sequencing by synthesis methodsdescribed above.

13. Epigenetics

Processes and solid supports described herein can be used to studyvariations in a target nucleic acid or protein relative to a referencenucleic acid or protein that are not based on sequence, e.g., theidentity of bases or amino acids that are the naturally occurringmonomeric units of the nucleic acid or protein. For example, thespecific cleavage reagents employed in the methods provided herein mayrecognize differences in sequence-independent features such asmethylation patterns, the presence of modified bases or amino acids, ordifferences in higher order structure between the target molecule andthe reference molecule, to generate fragments that are cleaved atsequence-independent sites. Epigenetics is the study of the inheritanceof information based on differences in gene expression rather thandifferences in gene sequence. Epigenetic changes refer to mitoticallyand/or meiotically heritable changes in gene function or changes inhigher order nucleic acid structure that cannot be explained by changesin nucleic acid sequence. Examples of features that are subject toepigenetic variation or change include, but are not limited to, DNAmethylation patterns in animals, histone modification and thePolycomb-trithorax group (Pc-G/tx) protein complexes (see, e.g., Bird,A., Genes Dev., 16:6-21 (2002)).

Epigenetic changes usually, although not necessarily, lead to changes ingene expression that are usually, although not necessarily, inheritable.For example, as discussed further below, changes in methylation patternssometimes may be an early event in cancer and other disease developmentand progression. In many cancers, certain genes are inappropriatelyswitched off or switched on due to aberrant methylation. The ability ofmethylation patterns to repress or activate transcription can beinherited. The Pc-G/trx protein complexes, like methylation, can represstranscription in a heritable fashion. The Pc-G/trx multiprotein assemblyis targeted to specific regions of the genome where it effectivelyfreezes the embryonic gene expression status of a gene, whether the geneis active or inactive, and propagates that state stably throughdevelopment. The ability of the Pc-G/trx group of proteins to target andbind to a genome affects only the level of expression of the genescontained in the genome, and not the properties of the gene products.The methods provided herein can be used with specific cleavage reagentsthat identify variations in a target sequence relative to a referencesequence that are based on sequence-independent changes, such asepigenetic changes.

Thus, provided herein is a method for the epigenetic analysis of atarget nucleotide sequence, which comprises (a) providing a nucleic acidfrom a sample in which methylated nucleotides or non-methylatednucleotides have been converted to another nucleotide moiety; (b)preparing a mixture of the sample nucleic acid with a solid supportdescribed herein having solid phase nucleic acid under conditions inwhich a single molecule of the sample nucleic acid hybridizes to a solidsupport molecule; (c) contacting the mixture with extension agents underconditions in which solid phase nucleic acid hybridized to samplenucleic acid; (d) preparing an emulsion; (e) amplifying the extendednucleic acid; (f) analyzing the sequence of the amplified nucleic acid;and (g) based on a sequence determined in (f), comparing the methylationpattern of the target nucleic acid to the methylation pattern of areference nucleic acid. In some embodiments the sample may be processedbefore step (b), by purifying the nucleic acid in the sample and/orfragmenting the sample nucleic acid, for example. Part (d) is optionalin certain embodiments: the extended solid phase nucleic acid of (c) maybe analyzed by sequencing (f), without amplification (e), usingsequencing by synthesis methods described above.

The term “another nucleotide moiety” as used herein refers to anucleotide moiety other than the nucleotide that was methylated ornon-methylated. The “another nucleotide moiety” may be naturallyoccurring or non-naturally occurring. Methylated nucleotides in thesample nucleic acid may be converted to another nucleotide moiety usingmethods known in the art, such as bisulfite conversion of methylatedcytosine to uracil, for example.

14. Methylation Patterns

Processes and solid supports described herein can be used to detectsequence variations that are epigenetic changes in the target sequence,such as a change in methylation patterns in the target sequence.Analysis of cellular methylation is an emerging research discipline. Thecovalent addition of methyl groups to cytosine is primarily present atCpG dinucleotides (microsatellites). Although the function of CpGislands not located in promoter regions remains to be explored, CpGislands in promoter regions are of special interest because theirmethylation status regulates the transcription and expression of theassociated gene. Methylation of promotor regions leads to silencing ofgene expression. This silencing is permanent and continues through theprocess of mitosis. Due to its significant role in gene expression, DNAmethylation has an impact on developmental processes, imprinting andX-chromosome inactivation as well as tumor genesis, aging, and alsosuppression of parasitic DNA. Methylation is thought to be involved inthe cancerogenesis of many widespread tumors, such as lung, breast, andcolon cancer, an in leukemia. There is also a relation betweenmethylation and protein dysfunctions (long Q-T syndrome) or metabolicdiseases (transient neonatal diabetes, type 2 diabetes).

Bisulfite treatment of genomic DNA can be utilized to analyze positionsof methylated cytosine residues within the DNA. Treating nucleic acidswith bisulfite deaminates cytosine residues to uracil residues, whilemethylated cytosine remains unmodified. Thus, by comparing the sequenceof a target nucleic acid that is not treated with bisulfite with thesequence of the nucleic acid that is treated with bisulfite in themethods provided herein, the degree of methylation in a nucleic acid aswell as the positions where cytosine is methylated can be deduced.

Methylation analysis via restriction endonuclease reaction is madepossible by using restriction enzymes, which have methylation-specificrecognition sites, such as Hpall and MSPI. The basic principle is thatcertain enzymes are blocked by methylated cytosine in the recognitionsequence. Once this differentiation is accomplished, subsequent analysisof the resulting fragments can be performed using the methods asprovided herein.

These methods can be used together in combined bisulfite restrictionanalysis (COBRA). Treatment with bisulfite causes a loss in BstUlrecognition site in amplified PCR product, which causes a new detectablefragment to appear on analysis compared to untreated sample. Thecleavage-based methods provided herein can be used in conjunction withspecific cleavage of methylation sites to provide rapid, reliableinformation on the methylation patterns in a target nucleic acidsequence.

Thus, provided herein is a method for analyzing a methylation pattern ofa target nucleotide sequence, which comprises (a) providing a nucleicacid from a sample in which methylated nucleotides or non-methylatednucleotides have been converted to another nucleotide moiety; (b)preparing a mixture of the sample nucleic acid with a solid supportdescribed herein having solid phase nucleic acid under conditions inwhich a single molecule of the sample nucleic acid hybridizes to a solidsupport molecule; (c) contacting the mixture with extension agents underconditions in which solid phase nucleic acid hybridized to samplenucleic acid is extended; (d) amplifying the extended nucleic acid; (e)analyzing the sequence of the amplified nucleic acid; and (f)determining the methylation pattern based on the sequence in (e). Insome embodiments the sample may be processed before step (b), bypurifying the nucleic acid in the sample and/or fragmenting the samplenucleic acid, for example. Part (d) is optional in certain embodiments:the extended solid phase nucleic acid of (c) may be analyzed bysequencing (e), without amplification (d), using sequencing by synthesismethods described above.

15. Resequencing

Processes and solid supports described herein are useful for rapidresequencing analyses. The dramatically growing amount of availablegenomic sequence information from various organisms increases the needfor technologies allowing large-scale comparative sequence analysis tocorrelate sequence information to function, phenotype, or identity. Theapplication of such technologies for comparative sequence analysis canbe widespread, including SNP discovery and sequence-specificidentification of pathogens. Therefore, resequencing and high-throughputmutation screening technologies are critical to the identification ofmutations underlying disease, as well as the genetic variabilityunderlying differential drug response.

Several approaches have been developed in order to satisfy these needs.A current technology for high-throughput DNA sequencing includes DNAsequencers using electrophoresis and laser-induced fluorescencedetection. Electrophoresis-based sequencing methods have inherentlimitations for detecting heterozygotes and are compromised by GCcompressions. Thus a DNA sequencing platform that produces digital datawithout using electrophoresis will overcome these problems.Matrix-assisted laser desorption/ionization time-of-flight massspectrometry (MALDI-TOF MS) measures DNA fragments with digital dataoutput. The methods of specific cleavage fragmentation analysis providedherein allow for high-throughput, high speed and high accuracy in thedetection of sequence variations relative to a reference sequence. Thisapproach makes it possible to routinely use MALDI-TOF MS sequencing foraccurate mutation detection, such as screening for founder mutations inBRCA1 and BRCA2, which are linked to the development of breast cancer.

Thus, the invention in part provides a method for resequencing a targetnucleotide sequence, which comprises (a) providing a sample nucleic acid(e.g., taken from a subject); (b) preparing a mixture of the samplenucleic acid with a solid support described herein having solid phasenucleic acid under conditions in which a single molecule of the samplenucleic acid hybridizes to a solid support molecule; (c) contacting themixture with extension agents under conditions in which solid phasenucleic acid hybridized to sample nucleic acid is extended; (d)amplifying the extended nucleic acid; (e) analyzing the sequence of theamplified nucleic acid; and (f) comparing a sequence determined in part(e) to a reference nucleotide sequence, whereby the target nucleotidesequence is resequenced. The reference sequence may be a nucleotidesequence already identified from the sample. In some embodiments thesample may be processed before step (b), by purifying the nucleic acidin the sample and/or fragmenting the sample nucleic acid, for example.

16. Multiplexing

Processes and solid supports described herein can allow for thehigh-throughput detection or discovery of sequences in a plurality oftarget sequences. Multiplexing refers to the simultaneous detection ofmore than one sequence, polymorphism or sequence variation. Multiplexingallows the simultaneous processing of many sequencing templates bypooling these at the earliest stages of the preparation procedure andresolves them into individual sequences at the latest possible stage ofthe sequencing process, thus enabling a high throughput of templateswith a reduction in repetitious steps. Methods for performingmultiplexed reactions, particularly in conjunction with massspectrometry, are known (see, e.g., U.S. Pat. Nos. 6,043,031, 5,547,835and International PCT application No. WO 97/37041).

Multiplexing can be performed, for example, for the same target nucleicacid sequence using different complementary specific cleavage reactionsas provided herein, or for different target nucleic acid sequences, andthe cleavage patterns can in turn be analyzed against a plurality ofreference nucleic acid sequences. Several mutations or sequencevariations can also be simultaneously detected on one target sequence byemploying the methods provided herein where each sequence variationcorresponds to a different cleavage product relative to the cleavagepattern of the reference nucleic acid sequence. Multiplexing providesthe advantage that a plurality of sequence variations can be identifiedin as few as a single mass spectrum, as compared to having to perform aseparate mass spectrometry analysis for each individual sequencevariation. The methods provided herein lend themselves tohigh-throughput, highly-automated processes for analyzing sequencevariations with high speed and accuracy, with the added advantage ofidentification of sequences not normally readable using gelelectrophoresis based methods. In some embodiments multiplex sequenceanalysis can also be combined with other non-limiting methods commonlyknown in the art, such as DNA sequencing by exonuclease degradation, forexample.

Thus, provided herein is a method for analyzing a target nucleotidesequence, which comprises (a) providing a nucleic acid from a sample;(b) preparing a mixture of the sample nucleic acid with a solid supportdescribed herein having solid phase nucleic acid under conditions inwhich a single molecule of the sample nucleic acid hybridizes to a solidsupport molecule; (c) contacting the mixture with extension agents underconditions in which solid phase nucleic acid hybridized to samplenucleic acid is extended; (d) amplifying the extended nucleic acid; (e)analyzing the sequence of the amplified nucleic acid; and (f)identifying two more sequences in the sample nucleic acid. In someembodiments the sample may be processed before step (b), by purifyingthe nucleic acid in the sample and/or fragmenting the sample nucleicacid, for example. Part (d) is optional in certain embodiments: theextended solid phase nucleic acid of (c) may be analyzed by sequencing(e), without amplification (d), using sequencing by synthesis methodsdescribed above.

17. Disease Outbreak Monitoring

Processes and solid supports described herein can be used to monitordisease outbreaks. In times of global transportation and traveloutbreaks of pathogenic endemics require close monitoring to preventtheir worldwide spread and enable control. DNA based typing byhigh-throughput technologies (e.g., using DNA chips, DNA arraytechnologies, and the like) enable a rapid sample throughput in acomparatively short time, as required in an outbreak situation.Currently, traditional methods of disease outbreak monitoring may takeas long as 7 to 10 days to identify pathogenic microorganisms. Usinghigh-throughput technologies may offer significant time savings in thecritical initial stages of disease outbreak monitoring, by reducingidentification times from 7 to 10 days to less than 2 days. Monitoringis performed by detecting one or more microbial marker regions (e.g.,SNP's, unique regions of rRNA and the like) in one or more samples. Agenus, species, strain or subtype of a microorganism can be monitored,using molecular markers to identify the presence or absence of nucleicacid sequences specific to known pathogenic microorganisms.

Thus, provided herein is a method for monitoring a disease outbreak,which comprises (a) providing a nucleic acid from a sample; (b)preparing a mixture of the sample nucleic acid with a solid supportdescribed herein having solid phase nucleic acid under conditions inwhich a single molecule of the sample nucleic acid hybridizes to a solidsupport molecule; (c) contacting the mixture with extension agents underconditions in which solid phase nucleic acid hybridized to samplenucleic acid is extended; (d) amplifying extended solid phase nucleicacid; (e) analyzing the sequence of the amplified nucleic acid in (d);and (f) comparing a sequence determined in (e) to a reference sequence,whereby the disease outbreak is monitored. The sample may be processedbefore step (b), by purifying the nucleic acids and/or fragmenting thenucleic acids, for example. The disease outbreak may be monitored bydetermining whether (i) there are new sequences as determined in part(e) not present in a reference sample (e.g., indicating that newpathogens are present in a population as part of a disease outbreak) and(ii) there are fewer sequences as determined in part (e) than present ina reference sample (e.g., indicating certain pathogens no longer are athreat as part of the disease outbreak). A reference sequence may befrom a sample taken from the same individual(s) at a different point intime (e.g., earlier point of time). In some embodiments the sample maybe processed before step (b), by purifying the nucleic acid in thesample and/or fragmenting the sample nucleic acid, for example. Part (d)is optional in certain embodiments: the extended solid phase nucleicacid of (c) may be analyzed by sequencing (e), without amplification(d), using sequencing by synthesis methods described above. In someembodiments multiplexed comparative sequence analysis in conjunctionwith matrix-assisted laser desorption/ionization time-of-flight massspectrometry (MALDI-TOF MS) analysis, can be used with embodimentsdescribed herein to monitor disease outbreaks.

18. Vaccine Quality Control and Production Clone Quality Control

Processes and solid supports described herein can be used to control theidentity of recombinant production clones, which can be vaccines or e.g.insulin or any other production clone or biological or medical product.The entire sequence or one or more portions of a clone or vaccine can beanalyzed in production samples and lots. Sequences determined byprocesses described herein can be compared to a reference sequence forthe clone or vaccine to monitor quality control. Quality controlmonitoring of this type can allow detection of spontaneous mutations orgenetic rearrangement at various stages of production in large scalebioreactors, thus allowing more efficient resource management byallowing early detection and shut down of processes which have beenmonitored and show deviation from the expected product

Thus, provided herein is a method for determining the quality of aproduction vaccine or clone sample, which comprises (a) providing aproduction vaccine or clone sample nucleic acid; (b) preparing a mixtureof the sample nucleic acid with a solid support described herein havingsolid phase nucleic acid under conditions in which a single molecule ofthe sample nucleic acid hybridizes to a solid support molecule; (c)contacting the mixture with extension agents under conditions in whichsolid phase nucleic acid hybridized to sample nucleic acid is extended;(d) amplifying extended nucleic acid; (e) analyzing the sequence ofamplified nucleic acid of part (d); and (f) comparing a sequencedetermined by part (e) to a clone or vaccine reference sequence, wherebythe quality of the production clone or vaccine is determined based uponthe comparison in part (f). The comparison in part (f) may be the degreeof identity between the entire sequence or subsequence of the productionclone or vaccine to a corresponding sequence in the reference clone orvaccine. A reference sequence may be obtained from a differentproduction lot or a progenitor clone or vaccine, for example. In someembodiments the sample may be processed before step (b), by purifyingthe nucleic acid in the sample and/or fragmenting the sample nucleicacid, for example. Part (d) is optional in certain embodiments: theextended solid phase nucleic acid of (c) may be analyzed by sequencing(e), without amplification (d), using sequencing by synthesis methodsdescribed above.

EXAMPLES

The following example illustrates certain embodiments and does not limitthe invention.

Example 1 Sequence Analysis Methodology

Described hereafter is methodology for performing nucleic acid sequenceanalyses described herein. Synthesized oligonucleotides containingprobe, primer and identification sequences are linked to solid support(beads, slides, chips, and the like, and in some embodiments, Dynal®beads) commonly available in the art, via appropriate linkage chemistry.In some embodiments using Dynal® beads, carbxoy-amino linkage chemistrymay be used to link synthesized oligonucleotides to the beads. Synthesisof oligonucleotides is well known in the art and a variety of methods tosynthesis oligonucleotides and oligonucleotide libraries, includingmethods which incorporate modified or derivatized nucleotides designedto increase the biological stability of the molecules or to increase thephysical stability of the duplex formed between the antisense and sensenucleic acids (e.g., phosphorothioate derivatives and acridinesubstituted nucleotides) can be chosen. Nucleotide sequences forsynthesized oligonucleotides may include any nucleic acid sequence(s)useful for biological or clinical investigation processes (e.g. SNP's,known probe sequences specific to pathogenic microorganisms and thelike) including, but not limited to, those applications and usesdescribed herein. In some embodiments synthesized oligonucleotides maybe linked to solid support under dilute conditions such that one or onlya few oligonucleotides are linked to each individual unit of solidsupport (1, 2, 3, 4, 5, or up to 10 linked synthesizedoligonucleotides), when using beads or particles as solid support, forexample. In some embodiments with more than one linked oligonucleotide,the oligonucleotides linked are not identical in sequence.

Sample nucleic acid is prepared and contacted with the synthesizedoligonucleotides containing probe, primer and identification sequenceslinked to solid support (hereinafter referred to as solid phaseoligonucleotides, or solid phase oligos). Sample nucleic acid may beprepared by any means commonly known in the art, including but notlimited to cell lysis procedures using chemical, physical orelectrolytic lysis methods. For example, chemical methods generallyemploy lysing agents to disrupt the cells and extract the nucleic acidsfrom the cells, followed by treatment with chaotropic salts. Physicalmethods such as freeze/thaw followed by grinding, the use of cellpresses and the like are also useful if intact proteins are desired.High salt lysis procedures are also commonly used. These procedures canbe found in Current Protocols in Molecular Biology, John Wiley & Sons,N.Y., 6.3.1-6.3.6 (1989), incorporated herein in its entirety.

Sample nucleic acid may be further manipulated or prepared followingcell lysis and nucleic acid isolation. Methods of nucleic acidpreparation including but not limited to, shearing, size fractionation,purification, methylation or demethylation, restriction nucleasetreatment, addition of nucleotides or linkers (defined herein as shortoligomers of nucleotides of specific or non-specific sequence),incorporation of detectable label and the like, may be used to preparesample nucleic acids for contact with solid phase oligonucleotides. Forexample, genomic sample DNA may be sheared, diluted and mixed with amolar excess of beads, in some embodiments. Genomic DNA mixed, underdilute conditions, with a molar excess of solid phase oligos (usingmolar ratios described above), enables binding of one sample nucleicacid molecule to one bead, in some embodiments. Sample nucleic acid andsolid phase oligos can be hybridized under any conditions useful forhybridization known in the art, non-limiting examples of which aredescribed above. Following hybridization, the solid phase oligo/samplenucleic acid complex may be isolated, in some embodiments. Isolation ofthese complexes may allow removal of possible amplification stepcontaminants.

The beads and sample DNA may be mixed with polymerase chain reactioncomponents and the mixture may be emulsified with mineral oil (e.g.,Margulies et al., “Genome sequencing in open microfabricated highdensity picoliter reactors,” Nature: 376-380 (2005); Kojima et al., “PCRamplification from single DNA molecules on magnetic beads in emulsion:application for high-throughput screening of transcription factortargets,” Nucleic Acids Research 33(17) (2005)), under conditions, thatallow extension and amplification (linear or exponential, as required bythe artisan) of the solid phase oligonucleotide using hybridized samplenucleic acid as the template. The extension of solid phase oligos, usingsample nucleic acid as a template, results in an extended solid phasenucleic acid that is substantially complimentary (i.e., antisense) tothe sample nucleic acid.

After solid phase nucleic acids are extended, the extended nucleic acidsmay be sequenced by any sequencing protocol known in the art including,but not limited to, sequencing methodologies described above (e.g.,sequencing by ligation, pyrosequencing, sequencing by synthesis) or asdescribed in, Bently, “Whole genome resequencing,” Curr Opin Genet Dev16(6):545-52 (2006); Shendure et al., “Accurate multiplex polonysequencing of an evolved bacterial genome,” Science 309(5741):1728-32(2005); Ju et al., “Four-color DNA sequencing by synthesis usingcleavable fluorescent nucleotide reversible terminators,” Proc Natl AcadSci USA 103(52):19635-40 (2006)), for example. Raw sequence data may bestored for later sequence assembly and analysis.

Sequence reads may be assembled using assembly algorithms intofull-length sequences (e.g., Warren et al.. “Assembling millions ofshort DNA sequences using SSAKE,” Bioinformatics 23(4):500-1 (2006);Jeck et al., “Extending assembly of short DNA sequences to handleerror,” Bioinformatics 23(21):2942-4 (2007)). After analyzing sequencedata, specific determinations may be made according to the method,process, or application performed by the artisan (e.g., detection of thepresence or absence of pathogenic organisms, quality control forbio-reactor processes, allelic frequency determinations, detection ofallelic variations in gene expression and the like).

The example described above may be used to detect, identify, andsequence viral nucleic acids found in viral mixtures (e.g., findinghepatitis B genotypes or serotypes in a hepatitis viral mixture), ormixed viral populations as might be found in samples isolated fromenvironmental sources or from immuno-deficient organisms. Themethodology for detecting, identifying, and sequencing viral nucleicacids is substantially similar to that described above for nucleic acidsequencing.

The entirety of each patent, patent application, publication anddocument referenced herein hereby is incorporated by reference. Citationof the above patents, patent applications, publications and documents isnot an admission that any of the foregoing is pertinent prior art, nordoes it constitute any admission as to the contents or date of thesepublications or documents.

Modifications may be made to the foregoing without departing from thebasic aspects of the invention. Although the invention has beendescribed in substantial detail with reference to one or more specificembodiments, those of ordinary skill in the art will recognize thatchanges may be made to the embodiments specifically disclosed in thisapplication, yet these modifications and improvements are within thescope and spirit of the invention.

The invention illustratively described herein suitably may be practicedin the absence of any element(s) not specifically disclosed herein.Thus, for example, in each instance herein any of the terms“comprising,” “consisting essentially of,” and “consisting of” may bereplaced with either of the other two terms. The terms and expressionswhich have been employed are used as terms of description and not oflimitation, and use of such terms and expressions do not exclude anyequivalents of the features shown and described or portions thereof, andvarious modifications are possible within the scope of the inventionclaimed. The term “a” or “an” can refer to one of or a plurality of theelements it modifies (e.g., “a reagent” can mean one or more reagents)unless it is contextually clear either one of the elements or more thanone of the elements is described. The term “about” as used herein refersto a value within 10% of the underlying parameter (i.e., plus or minus10%), and use of the term “about” at the beginning of a string of valuesmodifies each of the values (i.e., “about 1, 2 and 3” is about 1, about2 and about 3). For example, a weight of “about 100 grams” can includeweights between 90 grams and 110 grams. Thus, it should be understoodthat although the present invention has been specifically disclosed byrepresentative embodiments and optional features, modification andvariation of the concepts herein disclosed may be resorted to by thoseskilled in the art, and such modifications and variations are consideredwithin the scope of this invention.

Embodiments of the invention are set forth in the claim(s) thatfollow(s).

1. (canceled)
 2. A solid support comprising single-stranded solid phase nucleic acid species, wherein: (a) each solid phase nucleic acid species of the solid support comprises an identifier sequence and a probe sequence; (b) the probe sequence is complementary to a nucleotide sequence in a sample nucleic acid; (c) the solid phase nucleic acid species of the solid support share a common probe sequence or do not share a common probe sequence; and (d) the solid phase nucleic acid species of the solid support share a common identifier sequence.
 3. The solid support of claim 2, wherein: (a) the solid support is in a collection of solid supports; (b) at least one molecule of the solid phase nucleic acid species of each of the solid supports in the collection has a unique probe sequence different than a probe sequence of the solid phase nucleic acid species of the other solid supports; and (c) the solid phase nucleic acid species of each of the solid supports in the collection share a unique identifier sequence different than the identifier sequences of the solid phase nucleic acid species of the other solid supports in the collection.
 4. The solid support of claim 2, wherein each solid phase nucleic acid species of the solid support further comprises a primer sequence, wherein the primer sequence, the identifier sequence and the probe sequence are oriented 5′-(primer sequence)-(identifier sequence)-(probe sequence)-3′, wherein the 5′ terminal nucleotide of the solid phase nucleic acid is linked to the solid support.
 5. The solid support of claim 3, wherein each solid phase nucleic acid species of each of the solid supports in the collection further comprises a primer sequence, wherein the primer sequence, the identifier sequence and the probe sequence are oriented 5′-(primer sequence)-(identifier sequence)-(probe sequence)-3′, wherein the 5′ terminal nucleotide of the solid phase nucleic acid is linked to the solid support.
 6. The solid support of claim 4, wherein the solid phase nucleic acid species share a common primer sequence.
 7. The solid support of claim 6, wherein the primer sequence is a universal primer sequence.
 8. The solid support of claim 5, wherein the solid phase nucleic acid species of all the solid supports in the collection share a common primer sequence.
 9. The solid support of claim 8, wherein the primer sequence is a universal primer sequence.
 10. The solid support of claim 2, wherein the solid phase nucleic acid species share a common probe sequence.
 11. The solid support of claim 2, wherein the solid phase nucleic acid species do not share a common probe sequence.
 12. The solid support of claim 2, wherein the solid support is a bead or particle.
 13. The solid support of claim 12, wherein the solid support is a microbead, a nanobead, a microparticle or a nanoparticle.
 14. The solid support of claim 12, wherein the bead or particle is a gel.
 15. A substrate comprising a collection of the beads of claim 12, wherein the beads are oriented in an array and wherein: (a) at least one solid phase nucleic acid species of each of the beads in the collection has a unique probe sequence different than a probe sequence of the solid phase nucleic acid species of the other beads in the collection; and (b) the solid phase nucleic acid species of each bead in the collection share a unique identifier sequence different than the identifier sequences of the solid phase nucleic acid species of the other beads of the collection.
 16. A method of manufacturing the solid support of claim 2, which comprises (a) sequentially linking nucleotides to a nucleotide covalently linked to the solid support, whereby each of the single-stranded solid phase nucleic acid species is prepared and is in association with the solid support; or (b) linking each single-stranded nucleic acid species in solution phase to the solid support, whereby the single-stranded solid phase nucleic acid species are in association with the solid support; wherein: (i) each single-stranded solid phase nucleic acid species comprises n identifier sequence and a probe sequence; (ii) the solid phase nucleic acid species share a common probe sequence or the solid phase nucleic acid do not share a common probe sequence; and (iii) the solid phase nucleic acid species share a common identifier sequence.
 17. The method of claim 16, wherein: (a) the solid support is in a collection of solid supports; (b) at least one molecule of the solid phase nucleic acid species of each of the solid supports in the collection has a unique probe sequence different than a probe sequence of the solid phase nucleic acid species of the other solid supports; and (c) the solid phase nucleic acid species of each of the solid supports in the collection share a unique identifier sequence different than the identifier sequences of the solid phase nucleic acid species of the other solid supports in the collection. 